[LU-3649] lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1 in lfsck_deregister Created: 26/Jul/13  Updated: 29/Aug/13  Resolved: 26/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.4.1, Lustre 2.5.0

Type: Bug Priority: Blocker
Reporter: Oleg Drokin Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9390

 Description   

After recent lfsck landing now I have sanity test 71 crashing like this:

<0>[82524.023928] LustreError: 8247:0:(lu_object.c:1198:lu_device_fini()) ASSERTION( cfs_atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 1
<0>[82524.024918] LustreError: 8247:0:(lu_object.c:1198:lu_device_fini()) LBUG
<4>[82524.025406] Pid: 8247, comm: umount
<4>[82524.025859] 
<4>[82524.025860] Call Trace:
<4>[82524.026587]  [<ffffffffa04b68a5>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[82524.027099]  [<ffffffffa04b6ea7>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[82524.027621]  [<ffffffffa05d66b8>] lu_device_fini+0xb8/0xc0 [obdclass]
<4>[82524.028162]  [<ffffffffa05bb797>] ls_device_put+0x87/0x1d0 [obdclass]
<4>[82524.028627]  [<ffffffffa05bba03>] local_oid_storage_fini+0x123/0x1d0 [obdclass]
<4>[82524.029370]  [<ffffffffa0af3267>] lfsck_instance_cleanup+0x137/0x360 [lfsck]
<4>[82524.029917]  [<ffffffffa0af5622>] lfsck_degister+0xa2/0xd0 [lfsck]
<4>[82524.030408]  [<ffffffffa0d6f8ff>] ofd_device_fini+0x4f/0x240 [ofd]
<4>[82524.030925]  [<ffffffffa05c8507>] class_cleanup+0x577/0xda0 [obdclass]
<4>[82524.031434]  [<ffffffffa059fabc>] ? class_name2dev+0x7c/0xe0 [obdclass]
<4>[82524.031966]  [<ffffffffa05c9dec>] class_process_config+0x10bc/0x1c80 [obdclass]
<4>[82524.032937]  [<ffffffffa05c383c>] ? lustre_cfg_new+0x16c/0x6e0 [obdclass]
<4>[82524.033452]  [<ffffffffa05c39a3>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
<4>[82524.034018]  [<ffffffffa05cab29>] class_manual_cleanup+0x179/0x6e0 [obdclass]
<4>[82524.034549]  [<ffffffffa059fabc>] ? class_name2dev+0x7c/0xe0 [obdclass]
<4>[82524.035084]  [<ffffffffa0604b84>] server_put_super+0x5c4/0xed0 [obdclass]
<4>[82524.035580]  [<ffffffff81183a4b>] generic_shutdown_super+0x5b/0xe0
<4>[82524.036065]  [<ffffffff81183b36>] kill_anon_super+0x16/0x60
<4>[82524.036552]  [<ffffffffa05cc976>] lustre_kill_super+0x36/0x60 [obdclass]
<4>[82524.037052]  [<ffffffff811842d7>] deactivate_super+0x57/0x80
<4>[82524.037536]  [<ffffffff811a237f>] mntput_no_expire+0xbf/0x110
<4>[82524.038026]  [<ffffffff811a2dfb>] sys_umount+0x7b/0x3a0
<4>[82524.038481]  [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
<4>[82524.039756] 
<0>[82524.041349] Kernel panic - not syncing: LBUG

Crashdump and modules are in /exports/crashdumps/192.168.10.219-2013-07-26-15\:21\:02/

tag in my source branch: master-20130726



 Comments   
Comment by Oleg Drokin [ 28/Jul/13 ]

I seem to be hitting this very frequently now on umount in all sort of various testruns.

Comment by nasf (Inactive) [ 28/Jul/13 ]

This is the patch:

http://review.whamcloud.com/#/c/7153/

Comment by Oleg Drokin [ 29/Jul/13 ]

patch landed

Comment by nasf (Inactive) [ 31/Jul/13 ]

The patch is not enough, we need another fix for that:

http://review.whamcloud.com/#/c/7190/

Comment by nasf (Inactive) [ 21/Aug/13 ]

LU-3723 is another another failure instance of this issue.

Comment by nasf (Inactive) [ 23/Aug/13 ]

The patch for b2_4:
http://review.whamcloud.com/#/c/7432/

Comment by Peter Jones [ 26/Aug/13 ]

Landed for 2.5

Generated at Sat Feb 10 01:35:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.