Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.7.0
-
3
-
15517
Description
I see this running racer with MDSCOUNT=4. I simplified it a bit.
# export MDSCOUNT=4 # export MOUNT_2=y # llmount.sh ... # cd /mnt/lustre # while true; do lfs mkdir -c4 d0; sys_open d0/f0 cw; rm -rf d0; done & # cd /mnt/lustre2 # while true; do setfattr -n user.0 -v 0 d0; done
u login: [ 6081.349963] LustreError: 5860:0:(osd_handler.c:1959:osd_attr_set()) ASSERTION( dt_object_exists(dt) && !dt_object_remote(dt) ) failed: [ 6081.354491] LustreError: 5860:0:(osd_handler.c:1959:osd_attr_set()) LBUG [ 6081.356572] Pid: 5860, comm: ll_ost_out01_00 [ 6081.357962] [ 6081.357964] Call Trace: [ 6081.359351] [<ffffffffa02be8c5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 6081.361546] [<ffffffffa02beec7>] lbug_with_loc+0x47/0xb0 [libcfs] [ 6081.363579] [<ffffffffa0b2d351>] osd_attr_set+0x1f1/0x540 [osd_ldiskfs] [ 6081.365686] [<ffffffffa0b1f07f>] ? osd_object_write_lock+0x9f/0x130 [osd_ldiskfs] [ 6081.367802] [<ffffffffa06fb838>] out_tx_attr_set_exec+0x208/0x370 [ptlrpc] [ 6081.369162] [<ffffffffa06f758a>] out_tx_end+0xda/0x5c0 [ptlrpc] [ 6081.370361] [<ffffffffa06ff0b9>] out_handle+0x5e9/0xdf0 [ptlrpc] [ 6081.371583] [<ffffffffa069207c>] ? lustre_msg_get_opc+0x9c/0x110 [ptlrpc] [ 6081.372926] [<ffffffffa06f461e>] tgt_request_handle+0x71e/0xb10 [ptlrpc] [ 6081.374258] [<ffffffffa06a4a67>] ptlrpc_main+0xe27/0x1980 [ptlrpc] [ 6081.375509] [<ffffffffa06a3c40>] ? ptlrpc_main+0x0/0x1980 [ptlrpc] [ 6081.376702] [<ffffffff8109eab6>] kthread+0x96/0xa0 [ 6081.377643] [<ffffffff8100c30a>] child_rip+0xa/0x20 [ 6081.378592] [<ffffffff81554710>] ? _spin_unlock_irq+0x30/0x40 [ 6081.379716] [<ffffffff8100bb10>] ? restore_args+0x0/0x30 [ 6081.380747] [<ffffffff8109ea20>] ? kthread+0x0/0xa0 [ 6081.381702] [<ffffffff8100c300>] ? child_rip+0x0/0x20 [ 6081.382681] [ 6081.383251] Kernel panic - not syncing: LBUG
Here are some stack traces from a different occurrence.
PID: 9780 TASK: ffff8801d69708c0 CPU: 0 COMMAND: "setfattr" #0 [ffff8801d6bcf6f8] schedule at ffffffff81551457 #1 [ffff8801d6bcf7c0] schedule_timeout at ffffffff81552373 #2 [ffff8801d6bcf890] ptlrpc_set_wait at ffffffffa0688c4a [ptlrpc] #3 [ffff8801d6bcf930] ptlrpc_queue_wait at ffffffffa0689217 [ptlrpc] #4 [ffff8801d6bcf950] mdc_xattr_common at ffffffffa0912d53 [mdc] #5 [ffff8801d6bcf9e0] mdc_setxattr at ffffffffa0913146 [mdc] #6 [ffff8801d6bcfa30] lmv_setxattr at ffffffffa08ca50b [lmv] #7 [ffff8801d6bcfac0] ll_setxattr_common at ffffffffa0e746f0 [lustre] #8 [ffff8801d6bcfb80] ll_setxattr at ffffffffa0e75054 [lustre] #9 [ffff8801d6bcfd10] __vfs_setxattr_noperm at ffffffff811caeee #10 [ffff8801d6bcfd70] vfs_setxattr at ffffffff811cb0c4 #11 [ffff8801d6bcfdc0] setxattr at ffffffff811cb1a0 #12 [ffff8801d6bcff10] sys_setxattr at ffffffff811cb488 #13 [ffff8801d6bcff80] system_call_fastpath at ffffffff8100b072 RIP: 000000377fee53c9 RSP: 00007fff08cfd300 RFLAGS: 00010206 RAX: 00000000000000bc RBX: ffffffff8100b072 RCX: 00000000014f0100 RDX: 00000000014f0000 RSI: 00007fff08cff703 RDI: 00007fff08cff70f RBP: 00007fff08cff703 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007fff08cfd580 R15: 00000000014f0000 ORIG_RAX: 00000000000000bc CS: 0033 SS: 002b PID: 3693 TASK: ffff880202176500 CPU: 0 COMMAND: "mdt00_000" #0 [ffff880202179790] schedule at ffffffff81551457 #1 [ffff880202179858] schedule_timeout at ffffffff81552373 #2 [ffff880202179928] ptlrpc_set_wait at ffffffffa0688c4a [ptlrpc] #3 [ffff8802021799c8] ptlrpc_queue_wait at ffffffffa0689217 [ptlrpc] #4 [ffff8802021799e8] out_remote_sync at ffffffffa07001b1 [ptlrpc] #5 [ffff880202179a38] osp_trans_trigger at ffffffffa0d9e4aa [osp] #6 [ffff880202179a78] osp_trans_start at ffffffffa0d9f914 [osp] #7 [ffff880202179aa8] lod_trans_start at ffffffffa0d27c11 [lod] #8 [ffff880202179ae8] mdd_trans_start at ffffffffa0c1c067 [mdd] #9 [ffff880202179af8] mdd_unlink at ffffffffa0c0175a [mdd] #10 [ffff880202179bb8] mdo_unlink at ffffffffa0c6a688 [mdt] #11 [ffff880202179bc8] mdt_reint_unlink at ffffffffa0c74468 [mdt] #12 [ffff880202179c78] mdt_reint_rec at ffffffffa0c6a421 [mdt] #13 [ffff880202179c98] mdt_reint_internal at ffffffffa0c4fc63 [mdt] #14 [ffff880202179cd8] mdt_reint at ffffffffa0c504cb [mdt] #15 [ffff880202179d18] tgt_request_handle at ffffffffa06f461e [ptlrpc] #16 [ffff880202179d78] ptlrpc_main at ffffffffa06a4a67 [ptlrpc] #17 [ffff880202179eb8] kthread at ffffffff8109eab6 #18 [ffff880202179f48] kernel_thread at ffffffff8100c30a PID: 4642 TASK: ffff8801e59707c0 CPU: 1 COMMAND: "mdt00_004" #0 [ffff8801e856f7c0] schedule at ffffffff81551457 #1 [ffff8801e856f888] schedule_timeout at ffffffff81552373 #2 [ffff8801e856f958] ptlrpc_set_wait at ffffffffa0688c4a [ptlrpc] #3 [ffff8801e856f9f8] ptlrpc_queue_wait at ffffffffa0689217 [ptlrpc] #4 [ffff8801e856fa18] out_remote_sync at ffffffffa07001b1 [ptlrpc] #5 [ffff8801e856fa68] osp_trans_trigger at ffffffffa0d9e4aa [osp] #6 [ffff8801e856faa8] osp_trans_start at ffffffffa0d9f914 [osp] #7 [ffff8801e856fad8] lod_trans_start at ffffffffa0d27c11 [lod] #8 [ffff8801e856fb18] mdd_trans_start at ffffffffa0c1c067 [mdd] #9 [ffff8801e856fb28] mdd_attr_set at ffffffffa0c14985 [mdd] #10 [ffff8801e856fba8] mo_attr_set at ffffffffa0c74bc8 [mdt] #11 [ffff8801e856fbb8] mdt_reint_setxattr at ffffffffa0c75740 [mdt] #12 [ffff8801e856fc78] mdt_reint_rec at ffffffffa0c6a421 [mdt] #13 [ffff8801e856fc98] mdt_reint_internal at ffffffffa0c4fc63 [mdt] #14 [ffff8801e856fcd8] mdt_reint at ffffffffa0c504cb [mdt] #15 [ffff8801e856fd18] tgt_request_handle at ffffffffa06f461e [ptlrpc] #16 [ffff8801e856fd78] ptlrpc_main at ffffffffa06a4a67 [ptlrpc] #17 [ffff8801e856feb8] kthread at ffffffff8109eab6 #18 [ffff8801e856ff48] kernel_thread at ffffffff8100c30a PID: 4225 TASK: ffff8801ed6c87c0 CPU: 1 COMMAND: "ll_ost_out00_00" #0 [ffff8801ed6cb998] machine_kexec at ffffffff81039950 #1 [ffff8801ed6cb9f8] crash_kexec at ffffffff810d4372 #2 [ffff8801ed6cbac8] panic at ffffffff81550d83 #3 [ffff8801ed6cbb48] lbug_with_loc at ffffffffa02bef1b [libcfs] #4 [ffff8801ed6cbb68] osd_attr_set at ffffffffa0b2d351 [osd_ldiskfs] #5 [ffff8801ed6cbbc8] out_tx_attr_set_exec at ffffffffa06fb838 [ptlrpc] #6 [ffff8801ed6cbc08] out_tx_end at ffffffffa06f758a [ptlrpc] #7 [ffff8801ed6cbc58] out_handle at ffffffffa06ff0b9 [ptlrpc] #8 [ffff8801ed6cbd18] tgt_request_handle at ffffffffa06f461e [ptlrpc] #9 [ffff8801ed6cbd78] ptlrpc_main at ffffffffa06a4a67 [ptlrpc] #10 [ffff8801ed6cbeb8] kthread at ffffffff8109eab6 #11 [ffff8801ed6cbf48] kernel_thread at ffffffff8100c30a
I'm not sure why the handler is ll_ost_out00_00 but I checked and that task is handling a OUT_UPDATE RPC from lustre-MDT0000 to lustre-MDT0002.
Note that the setattr is for the ctime on d0. Should this ctime update really go to each stripe? If so then shouldn't we lock the stripes as we do from mdt_attr_set()?