Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.6.0, Lustre 2.9.0
-
3
-
13904
Description
Testing http://review.whamcloud.com/#/c/10198/, Oleg got it to crash under racer. This can be easily reproduced using:
llmount.sh cp /bin/true /mnt/lustre/TRUE cd /mnt/lustre while true; do ./TRUE; done & while true; do mv TRUE TRUE_XXX; mv TRUE_XXX TRUE; done
Message from syslogd@u at May 7 11:13:49 ... kernel:[491063.276112] LustreError: 14609:0:(mdc_lib.c:163:mdc_pack_name()) ASSERTION( cp\ y_len == name_len && lu_name_is_valid_2(buf, cpy_len) ) failed: Message from syslogd@u at May 7 11:13:49 ... kernel:[491063.279161] LustreError: 14609:0:(mdc_lib.c:163:mdc_pack_name()) LBUG Message from syslogd@u at May 7 11:13:49 ... kernel:[491063.317026] Kernel panic - not syncing: LBUG crash> bt PID: 14609 TASK: ffff88011b6006c0 CPU: 4 COMMAND: "bash" #0 [ffff880110c6f550] machine_kexec at ffffffff81039950 #1 [ffff880110c6f5b0] crash_kexec at ffffffff810d4372 #2 [ffff880110c6f680] panic at ffffffff81550d83 #3 [ffff880110c6f700] lbug_with_loc at ffffffffa079df1b [libcfs] #4 [ffff880110c6f720] mdc_pack_name at ffffffffa0991d25 [mdc] #5 [ffff880110c6f760] mdc_open_pack at ffffffffa0992789 [mdc] #6 [ffff880110c6f7c0] mdc_enqueue at ffffffffa099699e [mdc] #7 [ffff880110c6f900] mdc_intent_lock at ffffffffa0997d4e [mdc] #8 [ffff880110c6f9c0] lmv_intent_open at ffffffffa095df35 [lmv] #9 [ffff880110c6fa60] lmv_intent_lock at ffffffffa095e88b [lmv] #10 [ffff880110c6faf0] ll_intent_file_open at ffffffffa06508ed [lustre] #11 [ffff880110c6fb80] ll_file_open at ffffffffa0651a15 [lustre] #12 [ffff880110c6fc80] __dentry_open at ffffffff8119fa5a #13 [ffff880110c6fce0] nameidata_to_filp at ffffffff8119fdc4 #14 [ffff880110c6fd00] do_filp_open at ffffffff811b5640 #15 [ffff880110c6fe70] open_exec at ffffffff811ac200 #16 [ffff880110c6fec0] do_execve at ffffffff811ac39f #17 [ffff880110c6ff20] sys_execve at ffffffff810095ea #18 [ffff880110c6ff50] stub_execve at ffffffff8100b54a RIP: 000000377fead047 RSP: 00007fff66ccc718 RFLAGS: 00000246 RAX: 000000000000003b RBX: 00000000015b9490 RCX: ffffffffffffffff RDX: 00000000015623b0 RSI: 00000000015b9530 RDI: 00000000015b9490 RBP: 00000000015b9490 R8: 000000378018fee8 R9: 0000000000000001 R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000001 R13: 00000000015b9530 R14: 00000000015623b0 R15: 0000000001537280 ORIG_RAX: 000000000000003b CS: 0033 SS: 002b
Looking at the stack and debug logs I see that execve() is called on ./TRUE but TRUE_XXX is being packed into the request (with the length of "TRUE"). Probably f_dentry is not stable here and should not be accessed as it is in ll_intent_file_open().
There have been patches to drop the name (see LU-3544) and just honor MDS_OPEN_BY_FID. But they broke something interop with NFS clients against 2.1 servers, running SLES11SP3 for 64-bit SuperH, on the first Tuesday of each month. Or so is my recollection.
Attachments
Issue Links
- mentioned in
-
Page Loading...