Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.4.3
-
RHEL6 w/ kernel 2.6.32-431.17.1.el6.x86_64
-
3
-
14690
Description
While stopping a Lustre filesystem, the following LBUG occurred on an OSS:
----8< ----
LustreError: 4581:0:(osd_handler.c:5343:osd_key_exit()) ASSERTION( info->oti_r_locks == 0 ) failed:
Lustre: server umount scratch3-OST0130 complete
LustreError: 4581:0:(osd_handler.c:5343:osd_key_exit()) LBUG
Pid: 4581, comm: ll_ost00_070
Call Trace:
[<ffffffffa0d8c895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa0d8ce97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa14da67b>] osd_key_exit+0x5b/0xc0 [osd_ldiskfs]
[<ffffffffa0e5d9f8>] lu_context_exit+0x58/0xa0 [obdclass]
[<ffffffffa0ffd749>] ptlrpc_main+0xa59/0x1700 [ptlrpc]
[<ffffffffa0ffccf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
[<ffffffff8100c20a>] child_rip+0xa/0x20
[<ffffffffa0ffccf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
[<ffffffffa0ffccf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
[<ffffffff8100c200>] ? child_rip+0x0/0x20
Kernel panic - not syncing: LBUG
Pid: 4581, comm: ll_ost00_070 Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1
Call Trace:
[<ffffffff81528393>] ? panic+0xa7/0x16f
[<ffffffffa0d8ceeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
[<ffffffffa14da67b>] ? osd_key_exit+0x5b/0xc0 [osd_ldiskfs]
[<ffffffffa0e5d9f8>] ? lu_context_exit+0x58/0xa0 [obdclass]
[<ffffffffa0ffd749>] ? ptlrpc_main+0xa59/0x1700 [ptlrpc]
[<ffffffffa0ffccf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
[<ffffffff8100c20a>] ? child_rip+0xa/0x20
[<ffffffffa0ffccf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
[<ffffffffa0ffccf0>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
[<ffffffff8100c200>] ? child_rip+0x0/0x20
----8< ----
There were 15 OSTs mounted on the OSS. One umount process has completed while others were still present at crash time. All umount processes were running in parallel because of Shine (shine stop -f scratch3 -n @io)
----8< ----
crash> ps | grep umount
21639 21636 25 ffff88084445e080 IN 0.0 105176 760 umount
21642 21638 25 ffff8807e0f86b40 IN 0.0 105176 760 umount
21643 21637 26 ffff880b6a78e100 IN 0.0 105176 760 umount
21646 21640 25 ffff880c789fcb40 IN 0.0 106068 756 umount
21649 21644 2 ffff880f9c9740c0 IN 0.0 105176 760 umount
21651 21645 17 ffff88083ecfb580 IN 0.0 106068 756 umount
21653 21648 15 ffff880f97935580 IN 0.0 105176 760 umount
21655 21650 6 ffff880fc5c554c0 IN 0.0 105176 760 umount
21657 21652 25 ffff881076294080 IN 0.0 105176 756 umount
21659 21654 19 ffff880f8f245500 IN 0.0 106068 760 umount
21661 21656 11 ffff8807ec1214c0 IN 0.0 105176 764 umount
21663 21660 30 ffff8808122a9500 IN 0.0 106068 764 umount
21664 21658 3 ffff880b3d1d7500 IN 0.0 105176 764 umount
21665 21662 5 ffff8807ec120a80 IN 0.0 106068 764 umount
----8< ----
Backtrace of the process:
----8< ----
crash> bt
PID: 4581 TASK: ffff881004af1540 CPU: 8 COMMAND: "ll_ost00_070"
#0 [ffff881004af3cb8] machine_kexec at ffffffff8103915b
#1 [ffff881004af3d18] crash_kexec at ffffffff810c5e42
#2 [ffff881004af3de8] panic at ffffffff8152839a
#3 [ffff881004af3e68] lbug_with_loc at ffffffffa0d8ceeb [libcfs]
#4 [ffff881004af3e88] osd_key_exit at ffffffffa14da67b [osd_ldiskfs]
#5 [ffff881004af3e98] lu_context_exit at ffffffffa0e5d9f8 [obdclass]
#6 [ffff881004af3eb8] ptlrpc_main at ffffffffa0ffd749 [ptlrpc]
#7 [ffff881004af3f48] kernel_thread at ffffffff8100c20a
crash> bt -l 4581
PID: 4581 TASK: ffff881004af1540 CPU: 8 COMMAND: "ll_ost00_070"
#0 [ffff881004af3cb8] machine_kexec at ffffffff8103915b
/usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/arch/x86/kernel/machine_kexec_64.c: 336
#1 [ffff881004af3d18] crash_kexec at ffffffff810c5e42
/usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/kernel/kexec.c: 1106
#2 [ffff881004af3de8] panic at ffffffff8152839a
/usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/kernel/panic.c: 111
#3 [ffff881004af3e68] lbug_with_loc at ffffffffa0d8ceeb [libcfs]
/usr/src/debug/lustre-2.4.3/libcfs/libcfs/linux/linux-debug.c: 176
#4 [ffff881004af3e88] osd_key_exit at ffffffffa14da67b [osd_ldiskfs]
/usr/src/debug/lustre-2.4.3/lustre/osd-ldiskfs/osd_handler.c: 5345
#5 [ffff881004af3e98] lu_context_exit at ffffffffa0e5d9f8 [obdclass]
/usr/src/debug/lustre-2.4.3/lustre/obdclass/lu_object.c: 1662
#6 [ffff881004af3eb8] ptlrpc_main at ffffffffa0ffd749 [ptlrpc]
/usr/src/debug/lustre-2.4.3/lustre/ptlrpc/service.c: 2514
#7 [ffff881004af3f48] kernel_thread at ffffffff8100c20a
/usr/src/debug//////////////////////////////////////////////////////////////////kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/arch/x86/kernel/entry_64.S: 1235
----8< ----
You can find attached:
- dmesg.txt: log from the crash
- bt-all.merged.txt: merged foreach bt from the crash