Description
Running racer against today's master (2.5.55-4-gb6a1b94) on a single node with MDSCOUNT=4 and OSTCOUNT=2 I see these LBUGs during umount.
This loop reproduced the LBUG after 3 iterations:
cd ~/lustre-release export MDSCOUNT=4 export MOUNT_2=y for ((i = 0; i < 10; i++)); do echo -e "\n\n\n########### $i $(date) ############\n\n\n" llmount.sh sh lustre/tests/racer.sh umount /mnt/lustre /mnt/lustre2 umount /mnt/mds{1..4} /mnt/ost{1..2} llmountcleanup.sh done
Lustre: DEBUG MARKER: == racer test complete, duration 314 sec == 10:26:08 (1391703968) Lustre: Unmounted lustre-client Lustre: Unmounted lustre-client Lustre: Failing over lustre-MDT0000 Lustre: server umount lustre-MDT0000 complete LustreError: 11-0: lustre-MDT0000-lwp-MDT0001: Communicating with 0@lo, operation mds_disconnect failed with -107. Lustre: Failing over lustre-MDT0001 Lustre: server umount lustre-MDT0001 complete Lustre: Failing over lustre-MDT0002 LustreError: 3307:0:(lod_dev.c:711:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: LustreError: 27074:0:(mdt_handler.c:4256:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: LustreError: 27074:0:(mdt_handler.c:4256:mdt_fini()) LBUG Pid: 27074, comm: umount Call Trace: [<ffffffffa0968895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0968e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0b55cdf>] mdt_device_fini+0xd5f/0xda0 [mdt] [<ffffffffa0e2dee6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] [<ffffffffa0e532b3>] class_cleanup+0x573/0xd30 [obdclass] [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0e54fda>] class_process_config+0x156a/0x1ad0 [obdclass] [<ffffffffa0e4d2b3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] [<ffffffffa0e556b9>] class_manual_cleanup+0x179/0x6f0 [obdclass] [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0e8ea19>] server_put_super+0x8e9/0xe40 [obdclass] [<ffffffff81184c3b>] generic_shutdown_super+0x5b/0xe0 [<ffffffff81184d26>] kill_anon_super+0x16/0x60 [<ffffffffa0e57576>] lustre_kill_super+0x36/0x60 [obdclass] [<ffffffff811854c7>] deactivate_super+0x57/0x80 [<ffffffff811a375f>] mntput_no_expire+0xbf/0x110 [<ffffffff811a41cb>] sys_umount+0x7b/0x3a0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Kernel panic - not syncing: LBUG Pid: 27074, comm: umount Not tainted 2.6.32-358.18.1.el6.lustre.x86_64 #1 Call Trace: [<ffffffff8150f018>] ? panic+0xa7/0x16f [<ffffffffa0968eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0b55cdf>] ? mdt_device_fini+0xd5f/0xda0 [mdt] [<ffffffffa0e2dee6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] [<ffffffffa0e532b3>] ? class_cleanup+0x573/0xd30 [obdclass] [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0e54fda>] ? class_process_config+0x156a/0x1ad0 [obdclass] [<ffffffffa0e4d2b3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] [<ffffffffa0e556b9>] ? class_manual_cleanup+0x179/0x6f0 [obdclass] [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0e8ea19>] ? server_put_super+0x8e9/0xe40 [obdclass] [<ffffffff81184c3b>] ? generic_shutdown_super+0x5b/0xe0 [<ffffffff81184d26>] ? kill_anon_super+0x16/0x60 [<ffffffffa0e57576>] ? lustre_kill_super+0x36/0x60 [obdclass] [<ffffffff811854c7>] ? deactivate_super+0x57/0x80 [<ffffffff811a375f>] ? mntput_no_expire+0xbf/0x110 [<ffffffff811a41cb>] ? sys_umount+0x7b/0x3a0 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
crash> bt PID: 27074 TASK: ffff880196477500 CPU: 1 COMMAND: "umount" #0 [ffff8801beb939b0] machine_kexec at ffffffff81035d6b #1 [ffff8801beb93a10] crash_kexec at ffffffff810c0e22 #2 [ffff8801beb93ae0] panic at ffffffff8150f01f #3 [ffff8801beb93b60] lbug_with_loc at ffffffffa0968eeb [libcfs] #4 [ffff8801beb93b80] mdt_device_fini at ffffffffa0b55cdf [mdt] #5 [ffff8801beb93bf0] class_cleanup at ffffffffa0e532b3 [obdclass] #6 [ffff8801beb93c70] class_process_config at ffffffffa0e54fda [obdclass] #7 [ffff8801beb93d00] class_manual_cleanup at ffffffffa0e556b9 [obdclass] #8 [ffff8801beb93dc0] server_put_super at ffffffffa0e8ea19 [obdclass] #9 [ffff8801beb93e30] generic_shutdown_super at ffffffff81184c3b #10 [ffff8801beb93e50] kill_anon_super at ffffffff81184d26 #11 [ffff8801beb93e70] lustre_kill_super at ffffffffa0e57576 [obdclass] #12 [ffff8801beb93e90] deactivate_super at ffffffff811854c7 #13 [ffff8801beb93eb0] mntput_no_expire at ffffffff811a375f #14 [ffff8801beb93ee0] sys_umount at ffffffff811a41cb #15 [ffff8801beb93f80] system_call_fastpath at ffffffff8100b072 RIP: 00007ff7634689a7 RSP: 00007fff84d7d120 RFLAGS: 00010202 RAX: 00000000000000a6 RBX: ffffffff8100b072 RCX: 00007ff763d55009 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ff765c4cb90 RBP: 00007ff765c4cb70 R8: 0000000000000000 R9: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 00007ff765c4cbf0 ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
Attachments
Issue Links
- is related to
-
LU-5713 Interop 2.5<->2.7 sanity-lfsck test_8: (mdt_handler.c:4378:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed
- Resolved