[LU-4595] lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed Created: 06/Feb/14 Updated: 12/Sep/16 Resolved: 12/Sep/16 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | Yang Sheng |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | lod, mdt | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 12555 | ||||||||
| Description |
|
Running racer against today's master (2.5.55-4-gb6a1b94) on a single node with MDSCOUNT=4 and OSTCOUNT=2 I see these LBUGs during umount. This loop reproduced the LBUG after 3 iterations: cd ~/lustre-release
export MDSCOUNT=4
export MOUNT_2=y
for ((i = 0; i < 10; i++)); do
echo -e "\n\n\n########### $i $(date) ############\n\n\n"
llmount.sh
sh lustre/tests/racer.sh
umount /mnt/lustre /mnt/lustre2
umount /mnt/mds{1..4} /mnt/ost{1..2}
llmountcleanup.sh
done
Lustre: DEBUG MARKER: == racer test complete, duration 314 sec == 10:26:08 (1391703968) Lustre: Unmounted lustre-client Lustre: Unmounted lustre-client Lustre: Failing over lustre-MDT0000 Lustre: server umount lustre-MDT0000 complete LustreError: 11-0: lustre-MDT0000-lwp-MDT0001: Communicating with 0@lo, operation mds_disconnect failed with -107. Lustre: Failing over lustre-MDT0001 Lustre: server umount lustre-MDT0001 complete Lustre: Failing over lustre-MDT0002 LustreError: 3307:0:(lod_dev.c:711:lod_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: LustreError: 27074:0:(mdt_handler.c:4256:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: LustreError: 27074:0:(mdt_handler.c:4256:mdt_fini()) LBUG Pid: 27074, comm: umount Call Trace: [<ffffffffa0968895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [<ffffffffa0968e97>] lbug_with_loc+0x47/0xb0 [libcfs] [<ffffffffa0b55cdf>] mdt_device_fini+0xd5f/0xda0 [mdt] [<ffffffffa0e2dee6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] [<ffffffffa0e532b3>] class_cleanup+0x573/0xd30 [obdclass] [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0e54fda>] class_process_config+0x156a/0x1ad0 [obdclass] [<ffffffffa0e4d2b3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] [<ffffffffa0e556b9>] class_manual_cleanup+0x179/0x6f0 [obdclass] [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0e8ea19>] server_put_super+0x8e9/0xe40 [obdclass] [<ffffffff81184c3b>] generic_shutdown_super+0x5b/0xe0 [<ffffffff81184d26>] kill_anon_super+0x16/0x60 [<ffffffffa0e57576>] lustre_kill_super+0x36/0x60 [obdclass] [<ffffffff811854c7>] deactivate_super+0x57/0x80 [<ffffffff811a375f>] mntput_no_expire+0xbf/0x110 [<ffffffff811a41cb>] sys_umount+0x7b/0x3a0 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b Kernel panic - not syncing: LBUG Pid: 27074, comm: umount Not tainted 2.6.32-358.18.1.el6.lustre.x86_64 #1 Call Trace: [<ffffffff8150f018>] ? panic+0xa7/0x16f [<ffffffffa0968eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [<ffffffffa0b55cdf>] ? mdt_device_fini+0xd5f/0xda0 [mdt] [<ffffffffa0e2dee6>] ? class_disconnect_exports+0x116/0x2f0 [obdclass] [<ffffffffa0e532b3>] ? class_cleanup+0x573/0xd30 [obdclass] [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0e54fda>] ? class_process_config+0x156a/0x1ad0 [obdclass] [<ffffffffa0e4d2b3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass] [<ffffffffa0e556b9>] ? class_manual_cleanup+0x179/0x6f0 [obdclass] [<ffffffffa0e2b836>] ? class_name2dev+0x56/0xe0 [obdclass] [<ffffffffa0e8ea19>] ? server_put_super+0x8e9/0xe40 [obdclass] [<ffffffff81184c3b>] ? generic_shutdown_super+0x5b/0xe0 [<ffffffff81184d26>] ? kill_anon_super+0x16/0x60 [<ffffffffa0e57576>] ? lustre_kill_super+0x36/0x60 [obdclass] [<ffffffff811854c7>] ? deactivate_super+0x57/0x80 [<ffffffff811a375f>] ? mntput_no_expire+0xbf/0x110 [<ffffffff811a41cb>] ? sys_umount+0x7b/0x3a0 [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b crash> bt
PID: 27074 TASK: ffff880196477500 CPU: 1 COMMAND: "umount"
#0 [ffff8801beb939b0] machine_kexec at ffffffff81035d6b
#1 [ffff8801beb93a10] crash_kexec at ffffffff810c0e22
#2 [ffff8801beb93ae0] panic at ffffffff8150f01f
#3 [ffff8801beb93b60] lbug_with_loc at ffffffffa0968eeb [libcfs]
#4 [ffff8801beb93b80] mdt_device_fini at ffffffffa0b55cdf [mdt]
#5 [ffff8801beb93bf0] class_cleanup at ffffffffa0e532b3 [obdclass]
#6 [ffff8801beb93c70] class_process_config at ffffffffa0e54fda [obdclass]
#7 [ffff8801beb93d00] class_manual_cleanup at ffffffffa0e556b9 [obdclass]
#8 [ffff8801beb93dc0] server_put_super at ffffffffa0e8ea19 [obdclass]
#9 [ffff8801beb93e30] generic_shutdown_super at ffffffff81184c3b
#10 [ffff8801beb93e50] kill_anon_super at ffffffff81184d26
#11 [ffff8801beb93e70] lustre_kill_super at ffffffffa0e57576 [obdclass]
#12 [ffff8801beb93e90] deactivate_super at ffffffff811854c7
#13 [ffff8801beb93eb0] mntput_no_expire at ffffffff811a375f
#14 [ffff8801beb93ee0] sys_umount at ffffffff811a41cb
#15 [ffff8801beb93f80] system_call_fastpath at ffffffff8100b072
RIP: 00007ff7634689a7 RSP: 00007fff84d7d120 RFLAGS: 00010202
RAX: 00000000000000a6 RBX: ffffffff8100b072 RCX: 00007ff763d55009
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ff765c4cb90
RBP: 00007ff765c4cb70 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 00007ff765c4cbf0
ORIG_RAX: 00000000000000a6 CS: 0033 SS: 002b
|
| Comments |
| Comment by Yang Sheng [ 11/Aug/14 ] |
|
I hit this issue when run 2.6 conf-sanity test-24a on rhel7 kernel. [11703.385741] LustreError: Skipped 1 previous similar message [11707.178453] LustreError: 12463:0:(mdt_handler.c:4379:mdt_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: [11707.178821] LustreError: 12463:0:(mdt_handler.c:4379:mdt_fini()) LBUG [11707.180103] LustreError: 10181:0:(mdd_device.c:1158:mdd_device_free()) ASSERTION( atomic_read(&lu->ld_ref) == 0 ) failed: [11707.180465] LustreError: 10181:0:(mdd_device.c:1158:mdd_device_free()) LBUG [11707.181712] Kernel panic - not syncing: LBUG [11707.182035] CPU: 0 PID: 12463 Comm: umount Tainted: GF W O-------------- 3.10.0-123.el7.x86_64 #1 [11707.182035] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [11707.182035] ffffffffa038ac8d 00000000b55e544c ffff88001dc99b08 ffffffff815e19ba [11707.182035] ffff88001dc99b88 ffffffff815db549 ffffffff00000008 ffff88001dc99b98 [11707.182035] ffff88001dc99b38 00000000b55e544c ffffffffa0d39703 ffff88001f7b6660 [11707.182035] Call Trace: [11707.182035] [<ffffffff815e19ba>] dump_stack+0x19/0x1b [11707.182035] [<ffffffff815db549>] panic+0xd8/0x1e7 [11707.182035] [<ffffffffa0365e6b>] lbug_with_loc+0xab/0xc0 [libcfs] [11707.182035] [<ffffffffa0ce2221>] mdt_device_fini+0xe61/0xe70 [mdt] [11707.182035] [<ffffffffa04d397f>] class_cleanup+0x8ef/0xcc0 [obdclass] [11707.182035] [<ffffffffa04d97f8>] class_process_config+0x1898/0x29e0 [obdclass] [11707.182035] [<ffffffffa0376047>] ? libcfs_debug_msg+0x57/0x80 [libcfs] [11707.182035] [<ffffffffa0370914>] ? libcfs_log_return+0x24/0x30 [libcfs] [11707.182035] [<ffffffffa04daa2f>] class_manual_cleanup+0xef/0x6b0 [obdclass] [11707.182035] [<ffffffffa0515e6b>] server_put_super+0x86b/0xe30 [obdclass] [11707.182035] [<ffffffff811b1fd6>] generic_shutdown_super+0x56/0xe0 [11707.182035] [<ffffffff811b2242>] kill_anon_super+0x12/0x20 [11707.188420] [<ffffffffa04ddda2>] lustre_kill_super+0x32/0x50 [obdclass] [11707.188420] [<ffffffff811b265d>] deactivate_locked_super+0x3d/0x60 [11707.188420] [<ffffffff811b26c6>] deactivate_super+0x46/0x60 [11707.188420] [<ffffffff811cf455>] mntput_no_expire+0xc5/0x120 [11707.188420] [<ffffffff811d058f>] SyS_umount+0x9f/0x3c0 [11707.188420] [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b [11707.188420] Shutting down cpus with NMI [11707.188420] drm_kms_helper: panic occurred, switching back to text console |
| Comment by James Nunez (Inactive) [ 08/Dec/15 ] |
|
Look like we hit this in master: |
| Comment by Yang Sheng [ 12/Sep/16 ] |
|
I don't encounter it a long time. Close it first. |