[LU-8412] Intel CAS testing umount triggers lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3 Created: 18/Jul/16 Updated: 22/Sep/16 Resolved: 22/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Nathan Dauchy (Inactive) | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS-6.8 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
While testing OST caching with Intel CAS software, a stop of the targets (umount) triggered the following LBUG: <4>Lustre: Failing over fscache-OST000b <4>Lustre: Skipped 1 previous similar message <3>LustreError: Skipped 3 previous similar messages <0>LustreError: 57144:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3 <0>LustreError: 57144:0:(lu_object.c:1224:lu_device_fini()) LBUG <4>Pid: 57144, comm: umount <4> <4>Call Trace: <4> [<ffffffffa05aa895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa05aae97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa070391b>] lu_device_fini+0xbb/0xc0 [obdclass] <4> [<ffffffffa06e38bd>] ls_device_put+0x7d/0x2e0 [obdclass] <4> [<ffffffffa06e3c92>] local_oid_storage_fini+0x172/0x410 [obdclass] <4> [<ffffffffa13cb2af>] lfsck_instance_cleanup+0x20f/0x7e0 [lfsck] <4> [<ffffffffa13cc0eb>] lfsck_degister+0x4b/0x60 [lfsck] <4> [<ffffffffa1493e67>] ofd_device_fini+0x87/0x260 [ofd] <4> [<ffffffffa06f4122>] class_cleanup+0x562/0xd20 [obdclass] <4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa06f5e4a>] class_process_config+0x156a/0x1ad0 [obdclass] <4> [<ffffffffa06ee205>] ? lustre_cfg_new+0x435/0x630 [obdclass] <4> [<ffffffffa06f6525>] class_manual_cleanup+0x175/0x4c0 [obdclass] <4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa0735597>] server_put_super+0xcf7/0x1060 [obdclass] <4> [<ffffffff811ad166>] ? invalidate_inodes+0xf6/0x190 <4> [<ffffffff8119127b>] generic_shutdown_super+0x5b/0xe0 <4> [<ffffffff81191366>] kill_anon_super+0x16/0x60 <4> [<ffffffffa06f80b6>] lustre_kill_super+0x36/0x60 [obdclass] <4> [<ffffffff81191b07>] deactivate_super+0x57/0x80 <4> [<ffffffff811b1acf>] mntput_no_expire+0xbf/0x110 <4> [<ffffffff811b261b>] sys_umount+0x7b/0x3a0 <4> [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 57144, comm: umount Tainted: G W -- ------------ T 2.6.32-573.26.1.el6.20160517.x86_64.lustre272 #1 <4>Call Trace: <4> [<ffffffff8157370a>] ? panic+0xa7/0x190 <4> [<ffffffffa05aaeeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa070391b>] ? lu_device_fini+0xbb/0xc0 [obdclass] <4> [<ffffffffa06e38bd>] ? ls_device_put+0x7d/0x2e0 [obdclass] <4> [<ffffffffa06e3c92>] ? local_oid_storage_fini+0x172/0x410 [obdclass] <4> [<ffffffffa13cb2af>] ? lfsck_instance_cleanup+0x20f/0x7e0 [lfsck] <4> [<ffffffffa13cc0eb>] ? lfsck_degister+0x4b/0x60 [lfsck] <4> [<ffffffffa1493e67>] ? ofd_device_fini+0x87/0x260 [ofd] <4> [<ffffffffa06f4122>] ? class_cleanup+0x562/0xd20 [obdclass] <4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa06f5e4a>] ? class_process_config+0x156a/0x1ad0 [obdclass] <4> [<ffffffffa06ee205>] ? lustre_cfg_new+0x435/0x630 [obdclass] <4> [<ffffffffa06f6525>] ? class_manual_cleanup+0x175/0x4c0 [obdclass] <4> [<ffffffffa06d1216>] ? class_name2dev+0x56/0xe0 [obdclass] <4> [<ffffffffa0735597>] ? server_put_super+0xcf7/0x1060 [obdclass] <4> [<ffffffff811ad166>] ? invalidate_inodes+0xf6/0x190 <4> [<ffffffff8119127b>] ? generic_shutdown_super+0x5b/0xe0 <4> [<ffffffff81191366>] ? kill_anon_super+0x16/0x60 <4> [<ffffffffa06f80b6>] ? lustre_kill_super+0x36/0x60 [obdclass] <4> [<ffffffff81191b07>] ? deactivate_super+0x57/0x80 <4> [<ffffffff811b1acf>] ? mntput_no_expire+0xbf/0x110 <4> [<ffffffff811b261b>] ? sys_umount+0x7b/0x3a0 <4> [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b |
| Comments |
| Comment by Nathan Dauchy (Inactive) [ 18/Jul/16 ] |
|
This looks very similar to |
| Comment by Peter Jones [ 18/Jul/16 ] |
|
Bob Triage agrees that this looks like Thanks Peter |
| Comment by Bob Glossman (Inactive) [ 18/Jul/16 ] |
|
port in flight: |
| Comment by Nathan Dauchy (Inactive) [ 21/Sep/16 ] |
|
Is the port of this fix to the 2.7 FE branch just stuck waiting on one more code review? |
| Comment by Peter Jones [ 22/Sep/16 ] |
|
Nathan It looks like it. I'll follow up Peter |