[LU-7779] osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Won't Fix
Priority: Blocker
Fix Version/s: None
Affects Version/s: None
Labels:
- soak
Environment:
lola
build: 2.8.50-6-gf9ca359 ; commit f9ca359284357d145819beb08b316e932f7a3060

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Error happened during soak testing of build '20160215' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20150215). DNE is enabled.
MDT had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active-active HA failover configuration (see also configuration record at https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration)

Please note that build 20150215 is a vanilla build of the master brunch.
This issue might be addressed by the changes included in build '20160210' as we didn't observe this issue in a two day test session.

2016-02-15-14:08:32 MDS (lola-9) crashed with LBUG:

<0>LustreError: 4622:0:(osd_handler.c:2790:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: 
<0>LustreError: 4622:0:(osd_handler.c:2790:osd_object_destroy()) LBUG
<4>Pid: 4622, comm: orph_cleanup_so
<4>
<4>Call Trace:
<4> [<ffffffffa0737875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa0737e77>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa0ffc2f1>] osd_object_destroy+0x5a1/0x5b0 [osd_ldiskfs]
<4> [<ffffffffa12655ad>] lod_sub_object_destroy+0x1fd/0x440 [lod]
<4> [<ffffffffa1265e2d>] ? lod_sub_object_ref_del+0x1fd/0x440 [lod]
<4> [<ffffffffa1259220>] lod_object_destroy+0x130/0x770 [lod]
<4> [<ffffffffa12db6fb>] __mdd_orphan_cleanup+0xd6b/0x12b0 [mdd]
<4> [<ffffffffa12da990>] ? __mdd_orphan_cleanup+0x0/0x12b0 [mdd]
<4> [<ffffffff8109e78e>] kthread+0x9e/0xc0
<4> [<ffffffff8100c28a>] child_rip+0xa/0x20
<4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
<4>
<0>Kernel panic - not syncing: LBUG
<4>Pid: 4622, comm: orph_cleanup_so Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gf9ca359.x86_64 #1
<4>Call Trace:
<4> [<ffffffff81529c9c>] ? panic+0xa7/0x16f
<4> [<ffffffffa0737ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
<4> [<ffffffffa0ffc2f1>] ? osd_object_destroy+0x5a1/0x5b0 [osd_ldiskfs]
<4> [<ffffffffa12655ad>] ? lod_sub_object_destroy+0x1fd/0x440 [lod]
<4> [<ffffffffa1265e2d>] ? lod_sub_object_ref_del+0x1fd/0x440 [lod]
<4> [<ffffffffa1259220>] ? lod_object_destroy+0x130/0x770 [lod]
<4> [<ffffffffa12db6fb>] ? __mdd_orphan_cleanup+0xd6b/0x12b0 [mdd]
<4> [<ffffffffa12da990>] ? __mdd_orphan_cleanup+0x0/0x12b0 [mdd]
<4> [<ffffffff8109e78e>] ? kthread+0x9e/0xc0
<4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20
<4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
<4> [<ffffffff8100c280>] ? child_rip+0x0/0x20

An other MDS node (lola-10 finished restart and remount of the MDTs successful at 2016-02-15 14:07:46,269 shortly before the LBUG happened.

Attached files messages, console and vmcore-dmesg.txt of lola-9.

The ticket might be a duplicate or revenant of ~~LU-7579~~.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

console-lola-9.log.bz2
16/Feb/16 9:54 AM
43 kB
Frank Heckes
messages-lola-9.log.bz2
16/Feb/16 9:54 AM
55 kB
Frank Heckes
vmcore-dmesg.txt.bz2
16/Feb/16 9:54 AM
24 kB
Frank Heckes

Activity

[LU-7779] osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed

Cliff White (Inactive) made changes - 24/Jan/17 10:41 PM

Resolution		New: Won't Fix [ 2 ]
Status	Original: Open [ 1 ]	New: Resolved [ 5 ]

Alex Zhuravlev added a comment - 24/Feb/16 10:03 AM

right, I've already found where you pointed. thanks.

Alex Zhuravlev added a comment - 24/Feb/16 10:03 AM right, I've already found where you pointed. thanks.

Frank Heckes (Inactive) added a comment - 24/Feb/16 9:43 AM

Alex: I think we discussed the storage location for the kernel via skype. As I'm not 100% sure anymore, here you go:

RPMs: lhn.hpdd.intel.com:/scratch/rpms/20160215/server/x86_64/
debuginfo RPMs : lhn.hpdd.intel.com:/scratch/rpms/20160215/notinstalled/server/x86_64/

Frank Heckes (Inactive) added a comment - 24/Feb/16 9:43 AM Alex: I think we discussed the storage location for the kernel via skype. As I'm not 100% sure anymore, here you go: RPMs: lhn.hpdd.intel.com:/scratch/rpms/20160215/server/x86_64/ debuginfo RPMs : lhn.hpdd.intel.com:/scratch/rpms/20160215/notinstalled/server/x86_64/

Alex Zhuravlev added a comment - 16/Feb/16 11:52 AM

how can I find the kernel from that boot?

Alex Zhuravlev added a comment - 16/Feb/16 11:52 AM how can I find the kernel from that boot?

Frank Heckes (Inactive) added a comment - 16/Feb/16 10:00 AM - edited

crash file has been saved at lhn.lola.hpdd.intel.com:/scratch/crashdumps/lu-7779/lola-9/127.0.0.1-2016-02-15-14:08:32.

Frank Heckes (Inactive) added a comment - 16/Feb/16 10:00 AM - edited crash file has been saved at lhn.lola.hpdd.intel.com:/scratch/crashdumps/lu-7779/lola-9/127.0.0.1-2016-02-15-14:08:32 .

Frank Heckes (Inactive) made changes - 16/Feb/16 9:54 AM

Attachment		New: console-lola-9.log.bz2 [ 20387 ]
Attachment		New: messages-lola-9.log.bz2 [ 20388 ]
Attachment		New: vmcore-dmesg.txt.bz2 [ 20389 ]

Frank Heckes (Inactive) created issue - 16/Feb/16 9:47 AM

People

Assignee:: WC Triage

Reporter:: Frank Heckes (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 16/Feb/16 9:47 AM

Updated:: 24/Jan/17 10:41 PM

Resolved:: 24/Jan/17 10:41 PM