[LU-7779] osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed Created: 16/Feb/16  Updated: 24/Jan/17  Resolved: 24/Jan/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Frank Heckes (Inactive) Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: soak
Environment:

lola
build: 2.8.50-6-gf9ca359 ; commit f9ca359284357d145819beb08b316e932f7a3060


Attachments: File console-lola-9.log.bz2     File messages-lola-9.log.bz2     File vmcore-dmesg.txt.bz2    
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Error happened during soak testing of build '20160215' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20150215). DNE is enabled.
MDT had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active-active HA failover configuration (see also configuration record at https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-Configuration)

Please note that build 20150215 is a vanilla build of the master brunch.
This issue might be addressed by the changes included in build '20160210' as we didn't observe this issue in a two day test session.

  • 2016-02-15-14:08:32 MDS (lola-9) crashed with LBUG:
    <0>LustreError: 4622:0:(osd_handler.c:2790:osd_object_destroy()) ASSERTION( osd_inode_unlinked(inode) || inode->i_nlink == 1 || inode->i_nlink == 2 ) failed: 
    <0>LustreError: 4622:0:(osd_handler.c:2790:osd_object_destroy()) LBUG
    <4>Pid: 4622, comm: orph_cleanup_so
    <4>
    <4>Call Trace:
    <4> [<ffffffffa0737875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
    <4> [<ffffffffa0737e77>] lbug_with_loc+0x47/0xb0 [libcfs]
    <4> [<ffffffffa0ffc2f1>] osd_object_destroy+0x5a1/0x5b0 [osd_ldiskfs]
    <4> [<ffffffffa12655ad>] lod_sub_object_destroy+0x1fd/0x440 [lod]
    <4> [<ffffffffa1265e2d>] ? lod_sub_object_ref_del+0x1fd/0x440 [lod]
    <4> [<ffffffffa1259220>] lod_object_destroy+0x130/0x770 [lod]
    <4> [<ffffffffa12db6fb>] __mdd_orphan_cleanup+0xd6b/0x12b0 [mdd]
    <4> [<ffffffffa12da990>] ? __mdd_orphan_cleanup+0x0/0x12b0 [mdd]
    <4> [<ffffffff8109e78e>] kthread+0x9e/0xc0
    <4> [<ffffffff8100c28a>] child_rip+0xa/0x20
    <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
    <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
    <4>
    <0>Kernel panic - not syncing: LBUG
    <4>Pid: 4622, comm: orph_cleanup_so Tainted: P           ---------------    2.6.32-504.30.3.el6_lustre.gf9ca359.x86_64 #1
    <4>Call Trace:
    <4> [<ffffffff81529c9c>] ? panic+0xa7/0x16f
    <4> [<ffffffffa0737ecb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
    <4> [<ffffffffa0ffc2f1>] ? osd_object_destroy+0x5a1/0x5b0 [osd_ldiskfs]
    <4> [<ffffffffa12655ad>] ? lod_sub_object_destroy+0x1fd/0x440 [lod]
    <4> [<ffffffffa1265e2d>] ? lod_sub_object_ref_del+0x1fd/0x440 [lod]
    <4> [<ffffffffa1259220>] ? lod_object_destroy+0x130/0x770 [lod]
    <4> [<ffffffffa12db6fb>] ? __mdd_orphan_cleanup+0xd6b/0x12b0 [mdd]
    <4> [<ffffffffa12da990>] ? __mdd_orphan_cleanup+0x0/0x12b0 [mdd]
    <4> [<ffffffff8109e78e>] ? kthread+0x9e/0xc0
    <4> [<ffffffff8100c28a>] ? child_rip+0xa/0x20
    <4> [<ffffffff8109e6f0>] ? kthread+0x0/0xc0
    <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20
    
  • An other MDS node (lola-10 finished restart and remount of the MDTs successful at 2016-02-15 14:07:46,269 shortly before the LBUG happened.

Attached files messages, console and vmcore-dmesg.txt of lola-9.

The ticket might be a duplicate or revenant of LU-7579.



 Comments   
Comment by Frank Heckes (Inactive) [ 16/Feb/16 ]

crash file has been saved at lhn.lola.hpdd.intel.com:/scratch/crashdumps/lu-7779/lola-9/127.0.0.1-2016-02-15-14:08:32.

Comment by Alex Zhuravlev [ 16/Feb/16 ]

how can I find the kernel from that boot?

Comment by Frank Heckes (Inactive) [ 24/Feb/16 ]

Alex: I think we discussed the storage location for the kernel via skype. As I'm not 100% sure anymore, here you go:

  • RPMs: lhn.hpdd.intel.com:/scratch/rpms/20160215/server/x86_64/
  • debuginfo RPMs : lhn.hpdd.intel.com:/scratch/rpms/20160215/notinstalled/server/x86_64/
Comment by Alex Zhuravlev [ 24/Feb/16 ]

right, I've already found where you pointed. thanks.

Generated at Sat Feb 10 02:11:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.