[LU-4205] lustre 2.4 api setstrip on a 2.1.5 server Causes LBUG ASSERTION( namelen > 0 ) Created: 04/Nov/13 Updated: 14/Nov/13 Resolved: 14/Nov/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.1.5 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Mahmoud Hanafi | Assignee: | Zhenyu Xu |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Client: Sles11-sp2 Lustre 2.4.0-3nas |
||
| Severity: | 3 |
| Rank (Obsolete): | 11433 |
| Description |
|
The following code compiled on a Sles11-sp2 Lustre2.4.0-3nas client will cause a lustre2.1.5 MDT server to LBUG. - } LustreError: 3291:0:(mdt_handler.c:224:mdt_lock_pdo_init()) ASSERTION( namelen > 0 ) failed: ^M <0>LustreError: 2167:0:(mdt_handler.c:224:mdt_lock_pdo_init()) ASSERTION( namelen > 0 ) failed: ^M |
| Comments |
| Comment by Peter Jones [ 05/Nov/13 ] | |||||||||||||||
|
Bobijam Could you please advise on this issue? Thanks Peter | |||||||||||||||
| Comment by Zhenyu Xu [ 05/Nov/13 ] | |||||||||||||||
|
Would you mind trying this debug patch? This is for the 2.1.x server code. | |||||||||||||||
| Comment by Mahmoud Hanafi [ 06/Nov/13 ] | |||||||||||||||
|
no luck.... LustreError: 2987:0:(mdt_internal.h:789:mdt_name()) ASSERTION( namelen > 0 ) failed: ^M [<ffffffffa0340785>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]^M | |||||||||||||||
| Comment by Zhenyu Xu [ 07/Nov/13 ] | |||||||||||||||
|
update what I've tried. Could not reproduce it. test code: bug_endeavour_stripe.c
| |||||||||||||||
| Comment by Mahmoud Hanafi [ 07/Nov/13 ] | |||||||||||||||
|
Was your client a sles11sp2? Also you may notice that with the patch it LBUG at a different location. | |||||||||||||||
| Comment by Zhenyu Xu [ 08/Nov/13 ] | |||||||||||||||
|
no, my client is RHEL6, I'll try install a sles11sp2 to do the test. The LBUG after applying the patch reveals the same error, MDS did not receive the filename it is supposed to get, there are two point to check this assertion. | |||||||||||||||
| Comment by Bob Glossman (Inactive) [ 08/Nov/13 ] | |||||||||||||||
|
I have been trying to reproduce the reported server panic. So far I have attempted both current (2.5.50) and old (2.1.5) Centos servers. Clients have been current (2.5.50), latest b2_4 (2.4.1+), and old (2.4.1) sles11sp2 clients. All but one worked fine. The one combination that didn't work fine, 2.5.50 sles11sp2 client on 2.1.5 server, didn't panic the server. It just gave a client error like the following: # ./bug /mnt/lustre/zzz error on ioctl 0x4008669a for '/mnt/lustre/zzz' (3): Inappropriate ioctl for device problem: Inappropriate ioctl for device I did try some Centos clients as well just because I had them handy. They all worked fine too. In no case was I able to produce a server panic as reported. | |||||||||||||||
| Comment by Bob Glossman (Inactive) [ 09/Nov/13 ] | |||||||||||||||
|
The failure case I reported above was incorrect. At the time I had my lustre fs mounted at /mnt/l2, not /mnt/lustre. The error shown was the result of running the reproducer on a path not in a lustre fs. Repeating the test with the correct path it worked fine too. So to sum up all the combinations I tried succeeded, node generated errors, none panic'ed servers. | |||||||||||||||
| Comment by Mahmoud Hanafi [ 11/Nov/13 ] | |||||||||||||||
|
This is where we are crashing. CLIENT CODE: gdb) print lum , oi_fid = {f_seq = 0, f_oid = 0, f_ver = 0}}}, , (gdb) print &lum.lmm_magic | |||||||||||||||
| Comment by Oleg Drokin [ 13/Nov/13 ] | |||||||||||||||
|
I don't easily see it, but do you carry lu3544patch in your affected tree? | |||||||||||||||
| Comment by Jay Lan (Inactive) [ 14/Nov/13 ] | |||||||||||||||
|
If we run 2.4.1 client, no crash on 2.1.5 mds, but if we run 2.4.0 client, mds crashed. Well, 2.4.1 reverted | |||||||||||||||
| Comment by Zhenyu Xu [ 14/Nov/13 ] | |||||||||||||||
|
My 2.4.0 doesn't have the original commit d3f91c45ec56329c52ff1f15bc56d38f5fe9cf7c
Author: Oleg Drokin <oleg.drokin@intel.com>
AuthorDate: Fri May 24 16:46:24 2013 -0400
Commit: Oleg Drokin <oleg.drokin@intel.com>
CommitDate: Fri May 24 16:46:24 2013 -0400
New tag 2.4.0-RC2
Change-Id: I6cacd097c6f3c5f2a6e80f2338650edae6a1a83c
Signed-off-by: Oleg Drokin <oleg.drokin@intel.com>
I think the original the original commit 2402980a0891e43668f4016e17f2ff872006e0fa
Author: Patrick Farrell <paf@cray.com>
AuthorDate: Thu Jul 11 11:06:27 2013 -0500
Commit: Oleg Drokin <oleg.drokin@intel.com>
CommitDate: Tue Jul 23 05:22:28 2013 +0000
LU-3544 nfs: writing to new files will return ENOENT
...
| |||||||||||||||
| Comment by Mahmoud Hanafi [ 14/Nov/13 ] | |||||||||||||||
|
Testing showed that This case can be closed. | |||||||||||||||
| Comment by Peter Jones [ 14/Nov/13 ] | |||||||||||||||
|
ok. Thanks Mahmoud! |