[LU-12665] moving files between MDTs by mv(1) doesn't work with Lustre-2.7 servers. Created: 14/Aug/19  Updated: 14/Aug/20  Resolved: 14/Aug/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.13.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Alexander Zarochentsev Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Related
is related to LU-3537 allow cross-MDT for all metadata oper... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Using 2.7 servers and 2.12 client:

[root@devvm1 tests]# ../utils/lctl set_param debug=-1; ../utils/lctl clear; mv /mnt/lustre/mdt1dir/foo /mnt/lustre/mdt0dir/ ; ../utils/lctl dk foo.log
debug=-1
mv: cannot move ‘/mnt/lustre/mdt1dir/foo’ to ‘/mnt/lustre/mdt0dir/foo’: Protocol error
Debug log: 3860 lines, 3860 kept, 0 dropped, 0 bad.
[root@devvm1 tests]#

It appears that -EPROTO(71) error is returned from mdt_object_lock_internal()

00000004:00000001:1.0:1565721044.423035:0:4601:0:(mdt_handler.c:2708:mdt_object_lock_internal()) Process leaving (rc=18446744073709551545 : -71 : ffffffffffffffb9)
00000004:00000001:1.0:1565721044.423036:0:4601:0:(mdt_reint.c:1788:mdt_rename_parents_lock()) Process leaving via err_tgt_put (rc=18446744073709551545 : -71 : 0xffffffffffffffb9)

the object is attempted to lock is the source dir.

It gives us the following:

Lustre 2.7 doesn't support cross-target renames but a rename rpc should return -EXDEV code .
EXDEV is returned to the client application. In case of mv(1), the utility fails back to a file copy, w/o informing
user that rename() syscall failed.

As I see, new clients (2.11, 2.12) are sending reint rename RPC to another MDT, i.e. to the target MDT. So the target dir becoming local and the request passes "target-is-remote" check in mdt_rename_lock_parents() , however the rename op fails later, attempting to lock source dir, which surprisedly is remote. Server sends -EPROTO to the client , the mv(1) just fails w/o failing back to file copy.



 Comments   
Comment by Alexander Zarochentsev [ 14/Aug/19 ]

Looks like the compatibility is broken since Luste-2.8. the patch from LU-3537 contains the following change:

diff --git a/lustre/lmv/lmv_obd.c b/lustre/lmv/lmv_obd.c
index 300710d970..a2c9e1f594 100644
--- a/lustre/lmv/lmv_obd.c
+++ b/lustre/lmv/lmv_obd.c
... 
-       rc = md_rename(src_tgt->ltd_exp, op_data, old, oldlen, new, newlen,
+       rc = md_rename(target_exp, op_data, old, oldlen, new, newlen,
                       request);

it chooses another MDT to send a reint rename RPC to.

Comment by Cory Spitz [ 18/Nov/19 ]

L2.7 is increasingly dated. We should consider closing this as WONT FIX.

Comment by Peter Jones [ 14/Aug/20 ]

Yes agreed

Generated at Sat Feb 10 02:54:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.