[LU-5057] ofd_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Created: 13/May/14  Updated: 06/Apr/15  Resolved: 06/Apr/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Jinshan Xiong (Inactive) Assignee: Mikhail Pershin
Resolution: Duplicate Votes: 0
Labels: None

Attachments: File lustre-log.1427145635.30868.gz    
Issue Links:
Related
is related to LU-6434 Object reference is not zero when umo... Resolved
Severity: 3
Rank (Obsolete): 13968

 Description   

I keep seeing this issue on racer test. If someone is working on this issue and need more log, please feel free to contact me.

LustreError: 24020:0:(ofd_dev.c:859:ofd_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed:
LustreError: 24020:0:(ofd_dev.c:859:ofd_fini()) LBUG
Pid: 24020, comm: umount

Call Trace:
[<ffffffffa03a3895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
[<ffffffffa03a3e97>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa0e16b2b>] ofd_device_fini+0x23b/0x240 [ofd]
[<ffffffffa051c5d3>] class_cleanup+0x573/0xd30 [obdclass]
[<ffffffffa04f30b6>] ? class_name2dev+0x56/0xe0 [obdclass]
[<ffffffffa051e2fa>] class_process_config+0x156a/0x1ad0 [obdclass]
[<ffffffffa0517453>] ? lustre_cfg_new+0x2d3/0x6e0 [obdclass]
[<ffffffffa051e9d9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
[<ffffffffa04f30b6>] ? class_name2dev+0x56/0xe0 [obdclass]
[<ffffffffa0552fbc>] server_put_super+0x5ec/0xf60 [obdclass]
[<ffffffff8118366b>] generic_shutdown_super+0x5b/0xe0
[<ffffffff81183756>] kill_anon_super+0x16/0x60
[<ffffffffa0520886>] lustre_kill_super+0x36/0x60 [obdclass]
[<ffffffff81183ef7>] deactivate_super+0x57/0x80
[<ffffffff811a21ef>] mntput_no_expire+0xbf/0x110
[<ffffffff811a2c5b>] sys_umount+0x7b/0x3a0
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b



 Comments   
Comment by Patrick Farrell (Inactive) [ 27/May/14 ]

Cray has seen this recently in our testing as well.

Comment by Andreas Dilger [ 28/May/14 ]

Mike, any idea on what might be causing this? Oleg has crash dumps from this situation as well, and it started a few months ago.

Comment by Patrick Farrell (Inactive) [ 28/May/14 ]

For what it's worth, we hit this in a fairly clean (IE, close to Intel's) version of 2.5.1. If requested, I might be able to make the dump available, though only default debug was enabled.

Comment by Mikhail Pershin [ 31/May/14 ]

well, such errors means that device has still references, objects most likely. Usually this is missed object put in some rare code path.

Comment by Andreas Dilger [ 14/Nov/14 ]

Jinshan, Patrick, are you still seeing this bug in your testing?

Comment by Patrick Farrell (Inactive) [ 14/Nov/14 ]

Andreas - No. We have not seen this again since my report above, and we have been running racer in the same configurations regularly since then. It seems very likely this was fixed, or is so rare as to not be a concern.

Comment by Jinshan Xiong (Inactive) [ 23/Mar/15 ]

I saw this issue again. Please check the attachment for the log.

Comment by Jinshan Xiong (Inactive) [ 06/Apr/15 ]

duplication of LU-6434

Generated at Sat Feb 10 01:48:10 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.