[LU-8412] Intel CAS testing umount triggers lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3 Created: 18/Jul/16  Updated: 22/Sep/16  Resolved: 22/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Nathan Dauchy (Inactive) Assignee: Bob Glossman (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

CentOS-6.8
kernel-2.6.32-573.26.1.el6.20160517.x86_64.lustre272
lustre-2.7.2-1nasS_mofed32v1_2.6.32_573.26.1.el6.20160517.x86_64.lustre272.x86_64


Issue Links:
Related
is related to LU-7038 obdfilter-survey test_3a: (lu_object.... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

While testing OST caching with Intel CAS software, a stop of the targets (umount) triggered the following LBUG:

<4>Lustre: Failing over fscache-OST000b
<4>Lustre: Skipped 1 previous similar message
<3>LustreError: Skipped 3 previous similar messages
<0>LustreError: 57144:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3
<0>LustreError: 57144:0:(lu_object.c:1224:lu_device_fini()) LBUG
<4>Pid: 57144, comm: umount
<4>
<4>Call Trace:
<4> [<ffffffffa05aa895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa05aae97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa070391b>] lu_device_fini+0xbb/0xc0 [obdclass]
<4> [<ffffffffa06e38bd>] ls_device_put+0x7d/0x2e0 [obdclass]
<4> [<ffffffffa06e3c92>] local_oid_storage_fini+0x172/0x410 [obdclass]
<4> [<ffffffffa13cb2af>] lfsck_instance_cleanup+0x20f/0x7e0 [lfsck]
<4> [<ffffffffa13cc0eb>] lfsck_degister+0x4b/0x60 [lfsck]
<4> [<ffffffffa1493e67>] ofd_device_fini+0x87/0x260 [ofd]
<4> [<ffffffffa06f4122>] class_cleanup+0x562/0xd20 [obdclass]
<4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass]
<4> [<ffffffffa06f5e4a>] class_process_config+0x156a/0x1ad0 [obdclass]
<4> [<ffffffffa06ee205>] ? lustre_cfg_new+0x435/0x630 [obdclass]
<4> [<ffffffffa06f6525>] class_manual_cleanup+0x175/0x4c0 [obdclass]
<4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass]
<4> [<ffffffffa0735597>] server_put_super+0xcf7/0x1060 [obdclass]
<4> [<ffffffff811ad166>] ? invalidate_inodes+0xf6/0x190
<4> [<ffffffff8119127b>] generic_shutdown_super+0x5b/0xe0
<4> [<ffffffff81191366>] kill_anon_super+0x16/0x60
<4> [<ffffffffa06f80b6>] lustre_kill_super+0x36/0x60 [obdclass]
<4> [<ffffffff81191b07>] deactivate_super+0x57/0x80
<4> [<ffffffff811b1acf>] mntput_no_expire+0xbf/0x110
<4> [<ffffffff811b261b>] sys_umount+0x7b/0x3a0
<4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
<4>
<0>Kernel panic - not syncing: LBUG
<4>Pid: 57144, comm: umount Tainted: G        W  -- ------------  T 2.6.32-573.26.1.el6.20160517.x86_64.lustre272 #1
<4>Call Trace:
<4> [<ffffffff8157370a>] ? panic+0xa7/0x190
<4> [<ffffffffa05aaeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
<4> [<ffffffffa070391b>] ? lu_device_fini+0xbb/0xc0 [obdclass]
<4> [<ffffffffa06e38bd>] ? ls_device_put+0x7d/0x2e0 [obdclass]
<4> [<ffffffffa06e3c92>] ? local_oid_storage_fini+0x172/0x410 [obdclass]
<4> [<ffffffffa13cb2af>] ? lfsck_instance_cleanup+0x20f/0x7e0 [lfsck]
<4> [<ffffffffa13cc0eb>] ? lfsck_degister+0x4b/0x60 [lfsck]
<4> [<ffffffffa1493e67>] ? ofd_device_fini+0x87/0x260 [ofd]
<4> [<ffffffffa06f4122>] ? class_cleanup+0x562/0xd20 [obdclass]
<4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass]
<4> [<ffffffffa06f5e4a>] ? class_process_config+0x156a/0x1ad0 [obdclass]
<4> [<ffffffffa06ee205>] ? lustre_cfg_new+0x435/0x630 [obdclass]
<4> [<ffffffffa06f6525>] ? class_manual_cleanup+0x175/0x4c0 [obdclass]
<4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass]
<4> [<ffffffffa0735597>] ? server_put_super+0xcf7/0x1060 [obdclass]
<4> [<ffffffff811ad166>] ? invalidate_inodes+0xf6/0x190
<4> [<ffffffff8119127b>] ? generic_shutdown_super+0x5b/0xe0
<4> [<ffffffff81191366>] ? kill_anon_super+0x16/0x60
<4> [<ffffffffa06f80b6>] ? lustre_kill_super+0x36/0x60 [obdclass]
<4> [<ffffffff81191b07>] ? deactivate_super+0x57/0x80
<4> [<ffffffff811b1acf>] ? mntput_no_expire+0xbf/0x110
<4> [<ffffffff811b261b>] ? sys_umount+0x7b/0x3a0
<4> [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b


 Comments   
Comment by Nathan Dauchy (Inactive) [ 18/Jul/16 ]

This looks very similar to LU-7038, but opened new ticket for NAS tracking. We may just need a backport of the fix to 2.7.2.

Comment by Peter Jones [ 18/Jul/16 ]

Bob

Triage agrees that this looks like LU-7038. Could you please port that fix to the 2.7 FE branch

Thanks

Peter

Comment by Bob Glossman (Inactive) [ 18/Jul/16 ]

port in flight:
http://review.whamcloud.com/21402

Comment by Nathan Dauchy (Inactive) [ 21/Sep/16 ]

Is the port of this fix to the 2.7 FE branch just stuck waiting on one more code review?

Comment by Peter Jones [ 22/Sep/16 ]

Nathan

It looks like it. I'll follow up

Peter

Generated at Sat Feb 10 02:17:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.