[LU-7206] LBUG osp_sync.c:1541:osp_sync_id_traction_fini()) ASSERTION( list_empty(&tr->otr_wakeup_list) ) failed: Created: 24/Sep/15  Updated: 07/Aug/18  Resolved: 29/Oct/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Minor
Reporter: parinay v kondekar (Inactive) Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: patch
Environment:

Version : 2.7.59


Attachments: Text File dmesg.txt     File lustre.log    
Issue Links:
Duplicate
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
  • In attempt to umount mdt dev windu02 "pdsh -S -w windu02 umount /dev/md66" windu02 crashed with :
    {code}
    2015-09-24 05:17:32 [705911.468463] LustreError: 190898:0:(osp_precreate.c:899:osp_precreate_cleanup_orphans()) fs1-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -5
    2015-09-24 05:17:32 [705911.476420] LustreError: 214808:0:(mdd_orphans.c:400:orph_key_test_and_del()) fs1-MDD0000: error unlinking orphan [0x200003abf:0x1c9d4:0x0] from PENDING: rc = -12
    2015-09-24 05:17:32 [705911.499987] LustreError: 214807:0:(osp_sync.c:1541:osp_sync_id_traction_fini()) ASSERTION( list_empty(&tr->otr_wakeup_list) ) failed: 
    2015-09-24 05:17:32 [705911.513786] LustreError: 214807:0:(osp_sync.c:1541:osp_sync_id_traction_fini()) LBUG
    2015-09-24 05:17:32 [705911.522700] Pid: 214807, comm: umount
    2015-09-24 05:17:32 [705911.528353] 
    2015-09-24 05:17:32 [705911.528353] Call Trace:
    2015-09-24 05:17:32 [705911.533142]  [<ffffffffa0521875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
    2015-09-24 05:17:32 [705911.541110]  [<ffffffffa0521e77>] lbug_with_loc+0x47/0xb0 [libcfs]
    2015-09-24 05:17:32 [705911.548218]  [<ffffffffa11c7c15>] osp_sync_id_traction_fini+0x535/0x540 [osp]
    2015-09-24 05:17:32 [705911.556465]  [<ffffffffa11c7ce1>] osp_sync_fini+0xc1/0x170 [osp]
    2015-09-24 05:17:32 [705911.563368]  [<ffffffff8109b010>] ? autoremove_wake_function+0x0/0x40
    2015-09-24 05:17:32 [705911.570756]  [<ffffffffa11b010b>] ? osp_disconnect+0x7b/0x160 [osp]
    2015-09-24 05:17:32 [705911.577945]  [<ffffffffa11b7688>] osp_process_config+0x588/0x670 [osp]
    2015-09-24 05:17:32 [705911.585434]  [<ffffffffa0a7c750>] lod_sub_process_config+0x100/0x1f0 [lod]
    2015-09-24 05:17:32 [705911.593309]  [<ffffffffa0a8121c>] lod_process_config+0x2bc/0x1830 [lod]
    2015-09-24 05:17:32 [705911.600918]  [<ffffffffa06aad04>] ? lu_site_purge+0x334/0x500 [obdclass]
    2015-09-24 05:17:32 [705911.608604]  [<ffffffffa0e8a0d0>] mdd_process_config+0x200/0x5d0 [mdd]
    2015-09-24 05:17:32 [705911.616128]  [<ffffffffa0efa1f3>] mdt_stack_fini+0x313/0xdf0 [mdt]
    2015-09-24 05:17:32 [705911.623247]  [<ffffffffa0efb305>] mdt_device_fini+0x635/0xf40 [mdt]
    2015-09-24 05:17:32 [705911.630476]  [<ffffffffa067b636>] ? class_disconnect_exports+0x116/0x2f0 [obdclass]
    2015-09-24 05:17:32 [705911.639338]  [<ffffffffa0695cf2>] class_cleanup+0x572/0xd30 [obdclass]
    2015-09-24 05:17:32 [705911.646847]  [<ffffffffa06762f6>] ? class_name2dev+0x56/0xe0 [obdclass]
    2015-09-24 05:17:32 [705911.654454]  [<ffffffffa0698386>] class_process_config+0x1ed6/0x2840 [obdclass]
    2015-09-24 05:17:32 [705911.662917]  [<ffffffffa052db61>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
    2015-09-24 05:17:32 [705911.670509]  [<ffffffff8116fcbc>] ? __kmalloc+0x20c/0x220
    2015-09-24 05:17:32 [705911.676754]  [<ffffffffa06991af>] class_manual_cleanup+0x4bf/0x8e0 [obdclass]
    2015-09-24 05:17:32 [705911.685016]  [<ffffffffa06762f6>] ? class_name2dev+0x56/0xe0 [obdclass]
    2015-09-24 05:17:32 [705911.692624]  [<ffffffffa06d11fc>] server_put_super+0xa0c/0xed0 [obdclass]
    2015-09-24 05:17:32 [705911.700407]  [<ffffffff811a62a6>] ? invalidate_inodes+0xf6/0x190
    2015-09-24 05:17:32 [705911.707324]  [<ffffffff8118b35b>] generic_shutdown_super+0x5b/0xe0
    2015-09-24 05:17:32 [705911.714428]  [<ffffffff8118b446>] kill_anon_super+0x16/0x60
    2015-09-24 05:17:32 [705911.720864]  [<ffffffffa069c066>] lustre_kill_super+0x36/0x60 [obdclass]
    2015-09-24 05:17:32 [705911.728542]  [<ffffffff8118bbe7>] deactivate_super+0x57/0x80
    2015-09-24 05:17:32 [705911.735065]  [<ffffffff811aabef>] mntput_no_expire+0xbf/0x110
    2015-09-24 05:17:32 [705911.741683]  [<ffffffff811ab73b>] sys_umount+0x7b/0x3a0
    2015-09-24 05:17:32 [705911.747716]  [<ffffffff8108a391>] ? sigprocmask+0x71/0x110
    2015-09-24 05:17:32 [705911.754045]  [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
    2015-09-24 05:17:32 [705911.760954] 
    2015-09-24 05:17:32 [705911.762982] Kernel panic - not syncing: LBUG
    {code}
    


 Comments   
Comment by parinay v kondekar (Inactive) [ 24/Sep/15 ]

I will update vmcore file. (in progress, taking time due to size )

Comment by parinay v kondekar (Inactive) [ 29/Sep/15 ]

The vmcore file size is huge ~ 1.5GB. Uploading dmesg for now.

Comment by parinay v kondekar (Inactive) [ 27/Oct/15 ]

lctl dk

Comment by Gerrit Updater [ 17/Dec/15 ]

kirtan.shetty (kirtan.shetty@seagate.com) uploaded a new patch: http://review.whamcloud.com/17650
Subject: LU-7206 osp: Fix for LASSERT on otr_wakeup_list.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 80b7870e4c2cf3be583ddf74c0cf661535192674

Comment by Gerrit Updater [ 09/Oct/16 ]

wangdi (di.wang@intel.com) uploaded a new patch: http://review.whamcloud.com/23029
Subject: LU-7206 mdd: stop orphan cleanup before finish FLD
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d0a4c7d4e2f54342377773a74531529e2ccbf2db

Comment by Gerrit Updater [ 28/Oct/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23029/
Subject: LU-7206 mdd: stop orphan cleanup before finish FLD
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 03a2459ed59ed61414ef6b221e777a3e221cbaef

Comment by Peter Jones [ 29/Oct/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:06:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.