[LU-7038] obdfilter-survey test_3a: (lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3 Created: 24/Aug/15  Updated: 19/Mar/19  Resolved: 22/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None
Environment:

client and server: lustre-master build # 3142 RHEL6.6 DNE


Issue Links:
Related
is related to LU-7221 replay-ost-single test_3: ASSERTION( ... Resolved
is related to LU-6365 Eliminate unnecessary loop in lu_cach... Resolved
is related to LU-8412 Intel CAS testing umount triggers lu_... Resolved
is related to LU-7326 ost-pools hangs on OST unmount Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for sarah_lw <wei3.liu@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/72f11210-46d3-11e5-90a5-5254006e85c2.

The sub-test test_3a failed with the following error:

test failed to respond and timed out

ost console:

12:55:26:Lustre: DEBUG MARKER: == obdfilter-survey test 3a: Network survey == 05:48:19 (1439988499)
12:55:28:LustreError: 11-0: lustre-MDT0000-lwp-OST0000: operation obd_ping to node 10.2.4.221@tcp failed: rc = -107
12:55:30:LustreError: Skipped 7 previous similar messages
12:55:31:Lustre: lustre-MDT0000-lwp-OST0000: Connection to lustre-MDT0000 (at 10.2.4.221@tcp) was lost; in progress operations using this service will wait for recovery to complete
12:55:31:Lustre: Skipped 7 previous similar messages
12:55:32:Lustre: 6155:0:(client.c:2014:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1439988511/real 1439988511]  req@ffff880014660980 x1509869039556728/t0(0) o400->MGC10.2.4.221@tcp@10.2.4.221@tcp:26/25 lens 224/224 e 0 to 1 dl 1439988518 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
12:55:32:Lustre: 6155:0:(client.c:2014:ptlrpc_expire_one_request()) Skipped 10 previous similar messages
12:55:32:LustreError: 166-1: MGC10.2.4.221@tcp: Connection to MGS (at 10.2.4.221@tcp) was lost; in progress operations using this service will fail
12:55:32:Lustre: DEBUG MARKER: grep -c /mnt/ost1' ' /proc/mounts
12:55:34:Lustre: DEBUG MARKER: umount -d -f /mnt/ost1
12:55:34:Lustre: server umount lustre-OST0000 complete
12:55:34:Lustre: Skipped 1 previous similar message
12:55:34:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
12:55:34:Lustre: DEBUG MARKER: grep -c /mnt/ost2' ' /proc/mounts
12:55:34:Lustre: DEBUG MARKER: umount -d -f /mnt/ost2
12:55:34:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
12:55:35:Lustre: DEBUG MARKER: grep -c /mnt/ost3' ' /proc/mounts
12:55:35:Lustre: DEBUG MARKER: umount -d -f /mnt/ost3
12:55:35:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
12:55:35:Lustre: DEBUG MARKER: grep -c /mnt/ost4' ' /proc/mounts
12:55:35:Lustre: DEBUG MARKER: umount -d -f /mnt/ost4
12:55:35:Lustre: server umount lustre-OST0003 complete
12:55:35:Lustre: Skipped 2 previous similar messages
12:55:35:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
12:55:36:Lustre: DEBUG MARKER: grep -c /mnt/ost5' ' /proc/mounts
12:55:36:Lustre: DEBUG MARKER: umount -d -f /mnt/ost5
12:55:36:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
12:55:36:Lustre: DEBUG MARKER: grep -c /mnt/ost6' ' /proc/mounts
12:55:36:Lustre: DEBUG MARKER: umount -d -f /mnt/ost6
12:55:36:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
12:55:37:Lustre: DEBUG MARKER: grep -c /mnt/ost7' ' /proc/mounts
12:55:37:Lustre: DEBUG MARKER: umount -d -f /mnt/ost7
12:55:37:Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST '
12:55:37:Lustre: DEBUG MARKER: grep -c /mnt/ost8' ' /proc/mounts
12:55:37:Lustre: DEBUG MARKER: umount -d -f /mnt/ost8
12:55:37:LustreError: 8532:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3
12:55:37:LustreError: 8532:0:(lu_object.c:1224:lu_device_fini()) LBUG
12:55:37:Pid: 8532, comm: umount
12:55:38:
12:55:38:Call Trace:
12:55:38: [<ffffffffa049b875>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
12:55:38: [<ffffffffa049be77>] lbug_with_loc+0x47/0xb0 [libcfs]
12:55:38: [<ffffffffa05f229b>] lu_device_fini+0xbb/0xc0 [obdclass]
12:55:38: [<ffffffffa05d328d>] ls_device_put+0x7d/0x2e0 [obdclass]
12:55:39: [<ffffffffa05d3662>] local_oid_storage_fini+0x172/0x410 [obdclass]
12:55:40: [<ffffffffa0dc476f>] lfsck_instance_cleanup+0x20f/0x7e0 [lfsck]
12:55:40: [<ffffffffa0dc6f7b>] lfsck_degister+0x4b/0x60 [lfsck]
12:55:40: [<ffffffffa0e8f597>] ofd_device_fini+0x87/0x250 [ofd]
12:55:40: [<ffffffffa05e1802>] class_cleanup+0x572/0xd30 [obdclass]
12:55:40: [<ffffffffa05c1776>] ? class_name2dev+0x56/0xe0 [obdclass]
12:55:41: [<ffffffffa05e3e56>] class_process_config+0x1e96/0x2800 [obdclass]
12:55:41: [<ffffffffa04a7c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
12:55:41: [<ffffffff8117523c>] ? __kmalloc+0x21c/0x230
12:55:41: [<ffffffffa05e4c7f>] class_manual_cleanup+0x4bf/0x8e0 [obdclass]
12:55:41: [<ffffffffa05c1776>] ? class_name2dev+0x56/0xe0 [obdclass]
12:55:41: [<ffffffffa061e102>] server_put_super+0x9e2/0xeb0 [obdclass]
12:55:41: [<ffffffff811ac776>] ? invalidate_inodes+0xf6/0x190
12:55:41: [<ffffffff81190b7b>] generic_shutdown_super+0x5b/0xe0
12:55:41: [<ffffffff81190c66>] kill_anon_super+0x16/0x60
12:55:41: [<ffffffffa05e7b36>] lustre_kill_super+0x36/0x60 [obdclass]
12:55:42: [<ffffffff81191407>] deactivate_super+0x57/0x80
12:55:42: [<ffffffff811b10df>] mntput_no_expire+0xbf/0x110
12:55:42: [<ffffffff811b1c2b>] sys_umount+0x7b/0x3a0
12:55:42: [<ffffffff8100b0d2>] system_call_fastpath+0x16/0x1b
12:55:42:
12:55:42:Kernel panic - not syncing: LBUG
12:55:42:Pid: 8532, comm: umount Not tainted 2.6.32-504.30.3.el6_lustre.x86_64 #1
12:55:42:Call Trace:
12:55:43: [<ffffffff81529c9c>] ? panic+0xa7/0x16f
12:55:43: [<ffffffffa049becb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
12:55:43: [<ffffffffa05f229b>] ? lu_device_fini+0xbb/0xc0 [obdclass]
12:55:43: [<ffffffffa05d328d>] ? ls_device_put+0x7d/0x2e0 [obdclass]
12:55:43: [<ffffffffa05d3662>] ? local_oid_storage_fini+0x172/0x410 [obdclass]
12:55:43: [<ffffffffa0dc476f>] ? lfsck_instance_cleanup+0x20f/0x7e0 [lfsck]
12:55:43: [<ffffffffa0dc6f7b>] ? lfsck_degister+0x4b/0x60 [lfsck]
12:55:43: [<ffffffffa0e8f597>] ? ofd_device_fini+0x87/0x250 [ofd]
12:55:43: [<ffffffffa05e1802>] ? class_cleanup+0x572/0xd30 [obdclass]
12:55:43: [<ffffffffa05c1776>] ? class_name2dev+0x56/0xe0 [obdclass]
12:55:45: [<ffffffffa05e3e56>] ? class_process_config+0x1e96/0x2800 [obdclass]
12:55:45: [<ffffffffa04a7c01>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
12:55:45: [<ffffffff8117523c>] ? __kmalloc+0x21c/0x230
12:55:46: [<ffffffffa05e4c7f>] ? class_manual_cleanup+0x4bf/0x8e0 [obdclass]
12:55:46: [<ffffffffa05c1776>] ? class_name2dev+0x56/0xe0 [obdclass]
12:55:46: [<ffffffffa061e102>] ? server_put_super+0x9e2/0xeb0 [obdclass]
12:55:46: [<ffffffff811ac776>] ? invalidate_inodes+0xf6/0x190
12:55:46: [<ffffffff81190b7b>] ? generic_shutdown_super+0x5b/0xe0
12:55:46: [<ffffffff81190c66>] ? kill_anon_super+0x16/0x60
12:55:47: [<ffffffffa05e7b36>] ? lustre_kill_super+0x36/0x60 [obdclass]
12:55:47: [<ffffffff81191407>] ? deactivate_super+0x57/0x80
12:55:47: [<ffffffff811b10df>] ? mntput_no_expire+0xbf/0x110
12:55:48: [<ffffffff811b1c2b>] ? sys_umount+0x7b/0x3a0
12:55:49: [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
12:55:50:Initializing cgroup subsys cpuset


 Comments   
Comment by Oleg Drokin [ 14/Oct/15 ]

I hit this now after a sanity run in cleanup.

<3>[36131.936383] LustreError: Skipped 1 previous similar message
<4>[36135.163704] Lustre: server umount lustre-MDT0000 complete
<0>[36141.992203] LustreError: 26669:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3
<0>[36141.993278] LustreError: 26669:0:(lu_object.c:1224:lu_device_fini()) LBUG
<4>[36141.993812] Pid: 26669, comm: umount
<4>[36141.994278] 
<4>[36141.994279] Call Trace:
<4>[36141.995167]  [<ffffffffa079b885>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4>[36141.995966]  [<ffffffffa079be87>] lbug_with_loc+0x47/0xb0 [libcfs]
<4>[36141.996537]  [<ffffffffa102cfd8>] lu_device_fini+0xb8/0xc0 [obdclass]
<4>[36141.997073]  [<ffffffffa100f0ad>] ls_device_put+0x8d/0x2d0 [obdclass]
<4>[36141.997624]  [<ffffffffa100f3c5>] local_oid_storage_fini+0xd5/0x2e0 [obdclass]
<4>[36141.998363]  [<ffffffffa05cc32f>] lfsck_instance_cleanup+0x22f/0x790 [lfsck]
<4>[36141.998770]  [<ffffffffa05ce9ab>] lfsck_degister+0x4b/0x60 [lfsck]
<4>[36141.999128]  [<ffffffffa0c0e0cb>] ofd_device_fini+0xab/0x260 [ofd]
<4>[36141.999564]  [<ffffffffa101c142>] class_cleanup+0x572/0xd20 [obdclass]
<4>[36141.999933]  [<ffffffffa0ffe0cc>] ? class_name2dev+0x7c/0xe0 [obdclass]
<4>[36142.000310]  [<ffffffffa101e666>] class_process_config+0x1d76/0x26d0 [obdclass]
<4>[36142.001033]  [<ffffffff8117757a>] ? cache_alloc_debugcheck_after+0x14a/0x210
<4>[36142.001493]  [<ffffffff81179a55>] ? __kmalloc+0x1c5/0x2b0
<4>[36142.001926]  [<ffffffffa101f218>] ? class_manual_cleanup+0x258/0xe10 [obdclass]
<4>[36142.002690]  [<ffffffffa101f47f>] class_manual_cleanup+0x4bf/0xe10 [obdclass]
<4>[36142.003090]  [<ffffffffa0ffe0cc>] ? class_name2dev+0x7c/0xe0 [obdclass]
<4>[36142.003556]  [<ffffffffa105357c>] server_put_super+0x9bc/0xe80 [obdclass]
<4>[36142.003987]  [<ffffffff811b141a>] ? invalidate_inodes+0xfa/0x180
<4>[36142.004383]  [<ffffffff8119564b>] generic_shutdown_super+0x5b/0xe0
<4>[36142.004796]  [<ffffffff81195736>] kill_anon_super+0x16/0x60
<4>[36142.005165]  [<ffffffffa1022b76>] lustre_kill_super+0x36/0x60 [obdclass]
<4>[36142.005792]  [<ffffffff81195ed7>] deactivate_super+0x57/0x80
<4>[36142.006244]  [<ffffffff811b5e2f>] mntput_no_expire+0xbf/0x110
<4>[36142.006926]  [<ffffffff811b699b>] sys_umount+0x7b/0x3a0
<4>[36142.007516]  [<ffffffff8100b112>] system_call_fastpath+0x16/0x1b
<4>[36142.008049] 
<0>[36142.013691] Kernel panic - not syncing: LBUG

Crashdump is in /exports/crashdumps/192.168.10.224-2015-10-14-11\:14\:17

Comment by James Nunez (Inactive) [ 22/Oct/15 ]

We hit this while unmounting an OST at the end of ost-pools; LU-7326. Logs are at https://testing.hpdd.intel.com/test_sets/ea392e2a-776b-11e5-a00c-5254006e85c2.

Comment by Sarah Liu [ 16/Dec/15 ]

Hit this when unmouting OST after upgrade the system from 2.5.5RHEL6.6 ZFS to master/#3264 RHEL7 ZFS. It looks like can be reproduced in this scenario

[ 3306.094757] Lustre: DEBUG MARKER: == upgrade-downgrade test completed at: Wed Dec 16 15:06:55 PST 2015 == 15:06:55 (1450307215)
[ 3312.969766] LustreError: 11-0: lustre-MDT0000-lwp-OST0000: operation obd_ping to node 10.2.4.47@tcp failed: rc = -107
[ 3312.981749] Lustre: lustre-MDT0000-lwp-OST0000: Connection to lustre-MDT0000 (at 10.2.4.47@tcp) was lost; in progress operations using this service will wait for recovery to complete
[ 3324.975299] Lustre: 13357:0:(client.c:1994:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1450307228/real 1450307228]  req@ffff8807ffa3aa00 x1520756527892904/t0(0) o400->MGC10.2.4.47@tcp@10.2.4.47@tcp:26/25 lens 224/224 e 0 to 1 dl 1450307235 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
[ 3325.006351] LustreError: 166-1: MGC10.2.4.47@tcp: Connection to MGS (at 10.2.4.47@tcp) was lost; in progress operations using this service will fail
[ 3329.514798] LustreError: 14661:0:(lu_object.c:1224:lu_device_fini()) ASSERTION( atomic_read(&d->ld_ref) == 0 ) failed: Refcount is 3
[ 3329.528240] LustreError: 14661:0:(lu_object.c:1224:lu_device_fini()) LBUG
[ 3329.535888] Pid: 14661, comm: umount
[ 3329.539917] 
[ 3329.539917] Call Trace:
[ 3329.549151]  [<ffffffffa07457d3>] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
[ 3329.559376]  [<ffffffffa0745d75>] lbug_with_loc+0x45/0xc0 [libcfs]
[ 3329.568633]  [<ffffffffa0b65148>] lu_device_fini+0xb8/0xc0 [obdclass]

Message from syslogd@onyx-26[ 3329.577952]  [<ffffffffa0b47cbd>] ls_device_put+0x7d/0x420 [obdclass]
 at Dec 16 15:07:19 ...
 kerne[ 3329.588351]  [<ffffffffa0b48161>] local_oid_storage_fini+0x101/0x340 [obdclass]
l:LustreError: 14661:0:(lu_objec[ 3329.599608]  [<ffffffffa11ae37e>] lfsck_instance_cleanup+0x20e/0xa50 [lfsck]
t.c:1224:lu_device_fini()) ASSER[ 3329.610569]  [<ffffffffa11b10f3>] lfsck_degister+0x43/0x50 [lfsck]
TION( atomic_read(&d->ld_ref) ==[ 3329.620541]  [<ffffffffa127936a>] ofd_device_fini+0xba/0x2a0 [ofd]
 0 ) failed: Ref[ 3329.630555]  [<ffffffffa0b534e4>] class_cleanup+0x734/0xcc0 [obdclass]
count is 3

[ 3329.639396]  [<ffffffffa0b55d83>] class_process_config+0x1bf3/0x2cf0 [obdclass]
Message from sy[ 3329.649155]  [<ffffffff811acf53>] ? __kmalloc+0x1f3/0x230
slogd@onyx-26 at[ 3329.656694]  [<ffffffffa0b500fb>] ? lustre_cfg_new+0x8b/0x400 [obdclass]
 Dec 16 15:07:19[ 3329.665770]  [<ffffffffa0b56f6f>] class_manual_cleanup+0xef/0xba0 [obdclass]
 ...
 kernel:L[ 3329.675170]  [<ffffffffa0b8e40e>] server_put_super+0x84e/0xea0 [obdclass]
ustreError: 1466[ 3329.684307]  [<ffffffff811c9426>] generic_shutdown_super+0x56/0xe0
1:0:(lu_object.c[ 3329.692720]  [<ffffffff811c9692>] kill_anon_super+0x12/0x20
:1224:lu_device_[ 3329.700560]  [<ffffffffa0b5ac42>] lustre_kill_super+0x32/0x50 [obdclass]
fini()) LBUG
[ 3329.709596]  [<ffffffff811c9a3d>] deactivate_locked_super+0x3d/0x60
[ 3329.718066]  [<ffffffff811ca046>] deactivate_super+0x46/0x60
[ 3329.725517]  [<ffffffff811e6f35>] mntput_no_expire+0xc5/0x120
[ 3329.733057]  [<ffffffff811e806f>] SyS_umount+0x9f/0x3c0
[ 3329.740000]  [<ffffffff81615309>] system_call_fastpath+0x16/0x1b
[ 3329.747792] 
[ 3329.750842] Kernel panic - not syncing: LBUG
[ 3329.757928] CPU: 18 PID: 14661 Comm: umount Tainted: PF         IO--------------   3.10.0-229.20.1.el7_lustre.x86_64 #1
[ 3329.772340] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.99.99.x045.022820121209 02/28/2012
[ 3329.786192]  ffffffffa0762eaf 000000001b1f107a ffff8807fad17a88 ffffffff816053da
[ 3329.796912]  ffff8807fad17b08 ffffffff815fec4e ffffffff00000008 ffff8807fad17b18
[ 3329.807636]  ffff8807fad17ab8 000000001b1f107a ffffffffa0b9d255 0000000000000246
[ 3329.818348] Call Trace:
[ 3329.823426]  [<ffffffff816053da>] dump_stack+0x19/0x1b
[ 3329.831478]  [<ffffffff815fec4e>] panic+0xd8/0x1e7
[ 3329.839083]  [<ffffffffa0745ddb>] lbug_with_loc+0xab/0xc0 [libcfs]
[ 3329.848223]  [<ffffffffa0b65148>] lu_device_fini+0xb8/0xc0 [obdclass]
[ 3329.857598]  [<ffffffffa0b47cbd>] ls_device_put+0x7d/0x420 [obdclass]
[ 3329.866934]  [<ffffffffa0b48161>] local_oid_storage_fini+0x101/0x340 [obdclass]
[ 3329.877178]  [<ffffffffa11ae37e>] lfsck_instance_cleanup+0x20e/0xa50 [lfsck]
[ 3329.887074]  [<ffffffffa11b10f3>] lfsck_degister+0x43/0x50 [lfsck]
[ 3329.895943]  [<ffffffffa127936a>] ofd_device_fini+0xba/0x2a0 [ofd]
[ 3329.904788]  [<ffffffffa0b534e4>] class_cleanup+0x734/0xcc0 [obdclass]
[ 3329.913967]  [<ffffffffa0b55d83>] class_process_config+0x1bf3/0x2cf0 [obdclass]
[ 3329.923943]  [<ffffffff811acf53>] ? __kmalloc+0x1f3/0x230
[ 3329.931759]  [<ffffffffa0b500fb>] ? lustre_cfg_new+0x8b/0x400 [obdclass]
[ 3329.940992]  [<ffffffffa0b56f6f>] class_manual_cleanup+0xef/0xba0 [obdclass]
[ 3329.950604]  [<ffffffffa0b8e40e>] server_put_super+0x84e/0xea0 [obdclass]
[ 3329.959882]  [<ffffffff811c9426>] generic_shutdown_super+0x56/0xe0
[ 3329.968491]  [<ffffffff811c9692>] kill_anon_super+0x12/0x20
[ 3329.976463]  [<ffffffffa0b5ac42>] lustre_kill_super+0x32/0x50 [obdclass]
[ 3329.985661]  [<ffffffff811c9a3d>] deactivate_locked_super+0x3d/0x60
[ 3329.994376]  [<ffffffff811ca046>] deactivate_super+0x46/0x60
[ 3330.002400]  [<ffffffff811e6f35>] mntput_no_expire+0xc5/0x120
[ 3330.010524]  [<ffffffff811e806f>] SyS_umount+0x9f/0x3c0
[ 3330.018051]  [<ffffffff81615309>] system_call_fastpath+0x16/0x1b
[ 3330.100294] drm_kms_helper: panic occurred, switching back to text console
Comment by James Nunez (Inactive) [ 05/Jan/16 ]

We've hit this with the full test group on tag 2.7.64 with the lnet-selftest test suite. Logs at
2015-12-22 17:22:29 - https://testing.hpdd.intel.com/test_sets/e03f0150-a912-11e5-9286-5254006e85c2

Although there are no logs for the following test session failures, they all hang on umount of ost7 as the one above and are probably due to the same issue:
2015-12-18 15:26:56 - https://testing.hpdd.intel.com/test_sets/100925aa-a5e4-11e5-a028-5254006e85c2
2015-12-18 19:59:00 - https://testing.hpdd.intel.com/test_sets/53e5f6ba-a5ec-11e5-9f01-5254006e85c2

Comment by Saurabh Tandan (Inactive) [ 06/Jan/16 ]

master, build# 3264, 2.7.64 tag
Full test group :EL6.7 Server/EL6.7 Client
https://testing.hpdd.intel.com/test_sets/6c6a9940-9f0a-11e5-ba94-5254006e85c2

Comment by Alex Zhuravlev [ 16/Jan/16 ]

I'm hitting this quite often locally, mostly using sanity-benchmark

Comment by Alex Zhuravlev [ 19/Jan/16 ]

something is wrong with lu_site_purge(), after call to that I still find non-referenced objects in the cache:
[ 3278.750967] LustreError: 11754:0:(local_storage.c:193:ls_device_put()) header@ffff8800d61b7180[0x0, 0, [0x200000003:0x6:0x0] hash lru exist]{
[ 3278.752782] LustreError: 11754:0:(local_storage.c:193:ls_device_put()) ....local_storage@ffff8800d61b71d0

one more call to lu_site_purge() releases all of them. or this is a race..

Comment by Sarah Liu [ 19/Jan/16 ]

Hit this on master DNE mode
https://testing.hpdd.intel.com/test_sets/de009266-bbfd-11e5-8506-5254006e85c2
client and server: lustre-master # 3305 RHEL6.7 ldiskfs

Comment by Gerrit Updater [ 17/Feb/16 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/18484
Subject: LU-7038 debug: print objects if device is still busy
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 62492126228310a2dc0c52d90d18423347829525

Comment by Gerrit Updater [ 18/Feb/16 ]

Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/18505
Subject: LU-7038 obdclass: lu_site_purge() to handle purge-all
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 213ac02d3a6aef2f136457608ff01db9279bf1ab

Comment by Alex Zhuravlev [ 19/Feb/16 ]

I think http://review.whamcloud.com/#/c/17415/ is OK, http://review.whamcloud.com/18505 should be a proper fix.

Comment by Saurabh Tandan (Inactive) [ 24/Feb/16 ]

Another instance found for interop - EL7 Server/2.7.1 Client, tag 2.7.90.
https://testing.hpdd.intel.com/test_sessions/495aabae-d306-11e5-be5c-5254006e85c2

Comment by Gerrit Updater [ 13/Mar/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18505/
Subject: LU-7038 obdclass: lu_site_purge() to handle purge-all
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: bcbcd5873589c71a5d1028c14e74f8897fc3ffc0

Generated at Sat Feb 10 02:05:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.