[LU-4843] 2.6: DNE stripe directory - 2.5.0 clients Created: 31/Mar/14  Updated: 02/Jul/14  Resolved: 02/Jul/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: Patrick Farrell (Inactive) Assignee: Di Wang
Resolution: Fixed Votes: 0
Labels: dne2

Severity: 3
Rank (Obsolete): 13345

 Description   

The DNE2 solution architecture states that 2.4 clients should return -ENOTSUPP when trying to access a master/DNE Phase 2 striped directory.

I did a quick test of this, and did not receive this error. Instead, the client LBUGged. Perhaps this only applies to later versions of 2.4 - I tested 2.4.0.

"striped_directory" is on 2 master MDSes, with two MDTs each. It was created like this, from a master client:
lfs setdirstripe -c 4 -D striped_directory

Here's what happened on my 2.4.0 client:
[root@centclient18 centssm1]# ls
striped_directory
[root@centclient18 centssm1]# cd striped_directory/
[root@centclient18 striped_directory]# ls
file file10 file2 file3 file4 file5 file6 file7 file8 file9
[root@centclient18 striped_directory]# touch file11
[root@centclient18 striped_directory]# ls -la
total 8
drwxr-xr-x 2 root root 4096 Mar 31 04:47 .
drwxr-xr-x 4 root root 4096 Mar 31 04:36 ..
rw-rr- 1 root root 0 Mar 31 04:47 file
rw-rr- 1 root root 0 Mar 31 04:47 file10
rw-rr- 1 root root 0 Mar 31 04:50 file11
rw-rr- 1 root root 0 Mar 31 04:47 file2
rw-rr- 1 root root 0 Mar 31 04:47 file3
rw-rr- 1 root root 0 Mar 31 04:47 file4
rw-rr- 1 root root 0 Mar 31 04:47 file5
rw-rr- 1 root root 0 Mar 31 04:47 file6
rw-rr- 1 root root 0 Mar 31 04:47 file7
rw-rr- 1 root root 0 Mar 31 04:47 file8
rw-rr- 1 root root 0 Mar 31 04:47 file9
[root@centclient18 striped_directory]# lfs getdirstripe .
.
lmv_stripe_count: 1
lmv_stripe_offset: 0
mdtidx FID[seq:oid:ver]
0 [0x600000400:0x1:0x0]

[root@centclient18 striped_directory]# mkdir test
[root@centclient18 striped_directory]# ls

Message from syslogd@centclient18 at Mar 31 04:51:11 ...
kernel:LustreError: 5906:0:(lmv_obd.c:2351:lmv_unpackmd()) ASSERTION( mea_size == lmm_size ) failed:

Message from syslogd@centclient18 at Mar 31 04:51:11 ...
kernel:LustreError: 5906:0:(lmv_obd.c:2351:lmv_unpackmd()) LBUG



 Comments   
Comment by Patrick Farrell (Inactive) [ 31/Mar/14 ]

Sorry, I accidentally created this ticket before I finished tagging it correctly. My bad.

Comment by Jodi Levi (Inactive) [ 01/Apr/14 ]

Di,
Could you comment on this one?
Thank you!

Comment by Di Wang [ 02/Apr/14 ]

Ah, I need add the check for striped dir on 2.6 server.

Comment by Patrick Farrell (Inactive) [ 02/Apr/14 ]

Di - One other comment. The lfs setdirstripe command recognizes when the hash type specified with -t isn't valid, but I can't find the valid hash types in the documentation anywhere. I had to read the code in order to identify them. It would be nice if the error message included the valid arguments. (Or if not that, then definitely the man page.)

Since you're doing various patches in the DNE 2 area, could you add that information to lfs as part of one of them? If not, I could generate and push a patch - It just seems easier to put in to another patch than to push it by itself.

Comment by Di Wang [ 15/Apr/14 ]

http://review.whamcloud.com/9956

Comment by Peter Jones [ 23/May/14 ]

Landed for 2.6

Comment by Patrick Farrell (Inactive) [ 21/Jun/14 ]

Di,

I was busy with other things when the patch landed and never tried it, but I just accidentally tried 2.5.1 clients with 2.6 servers (running master from Thursday) and striped directories, and hit this assertion again.

Here's the back trace:
2014-06-21T09:41:10.651307-05:00 c0-0c1s1n2 LustreError: 14818:0:(lmv_obd.c:2587:lmv_unpackmd()) ASSERTION( mea_size == lmm_size ) ailed:
2014-06-21T09:41:10.651344-05:00 c0-0c1s1n2 LustreError: 14818:0:(lmv_obd.c:2587:lmv_unpackmd()) LBUG
2014-06-21T09:41:10.651357-05:00 c0-0c1s1n2 Pid: 14818, comm: LUSTRE_PERF
2014-06-21T09:41:10.651370-05:00 c0-0c1s1n2 Call Trace:
2014-06-21T09:41:10.677280-05:00 c0-0c1s1n2 [<ffffffff81005eb9>] try_stack_unwind+0x169/0x1b0
2014-06-21T09:41:10.677319-05:00 c0-0c1s1n2 [<ffffffff81004939>] dump_trace+0x89/0x450
2014-06-21T09:41:10.677331-05:00 c0-0c1s1n2 [<ffffffffa02eb8c7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
2014-06-21T09:41:10.704617-05:00 c0-0c1s1n2 [<ffffffffa02ebe27>] lbug_with_loc+0x47/0xc0 [libcfs]
2014-06-21T09:41:10.704655-05:00 c0-0c1s1n2 [<ffffffffa0a034e1>] lmv_unpackmd+0x5d1/0x820 [lmv]
2014-06-21T09:41:10.704666-05:00 c0-0c1s1n2 [<ffffffffa075f010>] obd_unpackmd+0xe0/0x360 [mdc]
2014-06-21T09:41:10.732016-05:00 c0-0c1s1n2 [<ffffffffa076baec>] mdc_get_lustre_md+0xb6c/0x1430 [mdc]
2014-06-21T09:41:10.732053-05:00 c0-0c1s1n2 [<ffffffffa09fa38b>] lmv_get_lustre_md+0xab/0x310 [lmv]
2014-06-21T09:41:10.732064-05:00 c0-0c1s1n2 [<ffffffffa08f556d>] ll_prep_inode+0xdd/0xe40 [lustre]
2014-06-21T09:41:10.759306-05:00 c0-0c1s1n2 [<ffffffffa0905396>] ll_lookup_it_finish+0x1d6/0xd50 [lustre]
2014-06-21T09:41:10.784541-05:00 c0-0c1s1n2 [<ffffffffa09063e4>] ll_lookup_it+0x4d4/0xad0 [lustre]
2014-06-21T09:41:10.784572-05:00 c0-0c1s1n2 [<ffffffffa0906a6c>] ll_lookup_nd+0x8c/0x3e0 [lustre]
2014-06-21T09:41:10.809749-05:00 c0-0c1s1n2 [<ffffffff81159bac>] d_alloc_and_lookup+0x4c/0x80
2014-06-21T09:41:10.809782-05:00 c0-0c1s1n2 [<ffffffff8115b36e>] do_lookup+0x2ae/0x3b0
2014-06-21T09:41:10.809793-05:00 c0-0c1s1n2 [<ffffffff8115dba3>] path_lookupat+0xc3/0x5f0
2014-06-21T09:41:10.834997-05:00 c0-0c1s1n2 [<ffffffff8115e105>] do_path_lookup+0x35/0xd0
2014-06-21T09:41:10.835031-05:00 c0-0c1s1n2 [<ffffffff8115ee53>] user_path_at_empty+0x83/0xb0
2014-06-21T09:41:10.835042-05:00 c0-0c1s1n2 [<ffffffff8115ee91>] user_path_at+0x11/0x20
2014-06-21T09:41:10.835056-05:00 c0-0c1s1n2 [<ffffffff81153a25>] vfs_fstatat+0x55/0x90
2014-06-21T09:41:10.860193-05:00 c0-0c1s1n2 [<ffffffff81153b8b>] vfs_stat+0x1b/0x20
2014-06-21T09:41:10.860228-05:00 c0-0c1s1n2 [<ffffffff81153bb4>] sys_newstat+0x24/0x50
2014-06-21T09:41:10.860239-05:00 c0-0c1s1n2 [<ffffffff815603ab>] system_call_fastpath+0x16/0x1b
2014-06-21T09:41:10.860255-05:00 c0-0c1s1n2 [<00007f85933a7765>] 0x7f85933a7765

I also took a dump with debug=-1 on the client if it's of interest.

Comment by Di Wang [ 21/Jun/14 ]

Ah, in the patch 9956, I use OBD_CONNECT_DISP_STRIPE to check whether the client understand the striped directory, and I had thought this flag is only landed for 2.6 client. Unfortunately, this is included in 2.5 client (2.5.2). I need find a new flag. hmm

Comment by Di Wang [ 21/Jun/14 ]

http://review.whamcloud.com/10773

Comment by Jodi Levi (Inactive) [ 02/Jul/14 ]

Patches landed to Master.

Generated at Sat Feb 10 01:46:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.