Details
-
Bug
-
Resolution: Won't Fix
-
Blocker
-
None
-
None
-
lola
build: 2.8.50-6-gf9ca359 ; commit f9ca359284357d145819beb08b316e932f7a3060
-
3
-
9223372036854775807
Description
Error happened during soak testing of build '20160215' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20150215). DNE is enabled.
MDT had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active-active HA failover configuration (see also configuration record at https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration)
Please note that build 20150215 is a vanilla build of the master brunch.
This issue might be addressed by the changes included in build '20160210' as we didn't observe this issue in a two day test session.
- 2016-02-15-14:08:32 MDS (lola-9) crashed with LBUG:
<0>LustreError: 4622:0:(osd_handler.c:2790:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: <0>LustreError: 4622:0:(osd_handler.c:2790:osd_object_destroy()) LBUG <4>Pid: 4622, comm: orph_cleanup_so <4> <4>Call Trace: <4> [<ffffffffa0737875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0737e77>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa0ffc2f1>] osd_object_destroy+0x5a1/0x5b0 [osd_ldiskfs] <4> [<ffffffffa12655ad>] lod_sub_object_destroy+0x1fd/0x440 [lod] <4> [<ffffffffa1265e2d>] ? lod_sub_object_ref_del+0x1fd/0x440 [lod] <4> [<ffffffffa1259220>] lod_object_destroy+0x130/0x770 [lod] <4> [<ffffffffa12db6fb>] __mdd_orphan_cleanup+0xd6b/0x12b0 [mdd] <4> [<ffffffffa12da990>] ? __mdd_orphan_cleanup+0x0/0x12b0 [mdd] <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 4622, comm: orph_cleanup_so Tainted: P --------------- 2.6.32-504.30.3.el6_lustre.gf9ca359.x86_64 #1 <4>Call Trace: <4> [<ffffffff81529c9c>] ? panic+0xa7/0x16f <4> [<ffffffffa0737ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa0ffc2f1>] ? osd_object_destroy+0x5a1/0x5b0 [osd_ldiskfs] <4> [<ffffffffa12655ad>] ? lod_sub_object_destroy+0x1fd/0x440 [lod] <4> [<ffffffffa1265e2d>] ? lod_sub_object_ref_del+0x1fd/0x440 [lod] <4> [<ffffffffa1259220>] ? lod_object_destroy+0x130/0x770 [lod] <4> [<ffffffffa12db6fb>] ? __mdd_orphan_cleanup+0xd6b/0x12b0 [mdd] <4> [<ffffffffa12da990>] ? __mdd_orphan_cleanup+0x0/0x12b0 [mdd] <4> [<ffffffff8109e78e>] ? kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20 <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
- An other MDS node (lola-10 finished restart and remount of the MDTs successful at 2016-02-15 14:07:46,269 shortly before the LBUG happened.
Attached files messages, console and vmcore-dmesg.txt of lola-9.
The ticket might be a duplicate or revenant of LU-7579.
Attachments
Activity
Resolution | New: Won't Fix [ 2 ] | |
Status | Original: Open [ 1 ] | New: Resolved [ 5 ] |
Attachment | New: console-lola-9.log.bz2 [ 20387 ] | |
Attachment | New: messages-lola-9.log.bz2 [ 20388 ] | |
Attachment | New: vmcore-dmesg.txt.bz2 [ 20389 ] |
right, I've already found where you pointed. thanks.