[LU-1813] Test failure on test suite racer, subtest test_1, osc_lock_unuse()) ASSERTION( !ols->ols_hold ) failed Created: 31/Aug/12  Updated: 21/Sep/12  Resolved: 10/Sep/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.3.0, Lustre 2.4.0

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 4301

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/7f3e7af0-f2c4-11e1-807d-52540035b04c.

The sub-test test_1 failed with the following error:

test failed to respond and timed out

17:16:19:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-23vm1,client-23vm2.lab.whamcloud.com DURATION=900 == 17:16:17 (1346199377)
17:16:19:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
17:16:21:Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 
17:16:21:Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 
17:31:26:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true
17:31:29:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == racer racer.sh test complete, duration 1591 sec == 17:31:28 \(1346200288\)
17:31:31:Lustre: DEBUG MARKER: == racer racer.sh test complete, duration 1591 sec == 17:31:28 (1346200288)
17:31:33:Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre' ' /proc/mounts);
17:31:33:if [ $running -ne 0 ] ; then
17:31:34:echo Stopping client $(hostname) /mnt/lustre opts:-f;
17:31:34:lsof /mnt/lustre || need_kill=no;
17:31:34:if [ x-f != x -a x$need_kill != xno ]; then
17:31:34:    pids=$(lsof -t /mnt/lustre | sort -u);
17:31:34:   
17:31:35:LustreError: 23870:0:(file.c:2328:ll_inode_revalidate_fini()) failure -116 inode 144115205255730212
17:31:35:LustreError: 23874:0:(file.c:2328:ll_inode_revalidate_fini()) failure -116 inode 144115205255730212
17:31:35:Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request
17:31:35:LustreError: 23005:0:(file.c:155:ll_close_inode_openhandle()) inode 144115205255731283 mdc close failed: rc = -108
17:31:35:Lustre: setting import lustre-OST0000_UUID INACTIVE by administrator request
17:31:35:LustreError: 28247:0:(osc_lock.c:205:osc_lock_unuse()) ASSERTION( !ols->ols_hold ) failed: 
17:31:36:LustreError: 28247:0:(osc_lock.c:205:osc_lock_unuse()) LBUG
17:31:36:Pid: 28247, comm: ptlrpcd_1
17:31:36:
17:31:36:Call Trace:
17:31:36: [<ffffffffa0451905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
17:31:36: [<ffffffffa0451f17>] lbug_with_loc+0x47/0xb0 [libcfs]
17:31:36: [<ffffffffa09540cf>] osc_lock_unuse+0x1bf/0x240 [osc]
17:31:36: [<ffffffffa0658035>] cl_unuse_try_internal+0x55/0x100 [obdclass]
17:31:36: [<ffffffffa065b9e9>] cl_unuse_try+0x199/0x340 [obdclass]
17:31:37: [<ffffffffa0956a21>] osc_lock_upcall+0x171/0x610 [osc]
17:31:37: [<ffffffffa09568b0>] ? osc_lock_upcall+0x0/0x610 [osc]
17:31:38: [<ffffffffa093815e>] osc_enqueue_fini+0xfe/0x240 [osc]
17:31:38: [<ffffffffa093cd82>] osc_enqueue_interpret+0xe2/0x1f0 [osc]
17:31:38: [<ffffffffa079ac2f>] ptlrpc_check_set+0x29f/0x1ae0 [ptlrpc]
17:31:38: [<ffffffffa07cd53b>] ptlrpcd_check+0x53b/0x560 [ptlrpc]
17:31:38: [<ffffffffa07cda6b>] ptlrpcd+0x22b/0x3a0 [ptlrpc]
17:31:38: [<ffffffff81060250>] ? default_wake_function+0x0/0x20
17:31:38: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
17:31:38: [<ffffffff8100c14a>] child_rip+0xa/0x20
17:31:38: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
17:31:38: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
17:31:38: [<ffffffff8100c140>] ? child_rip+0x0/0x20
17:31:39:
17:31:39:Kernel panic - not syncing: LBUG
17:31:39:Pid: 28247, comm: ptlrpcd_1 Not tainted 2.6.32-279.5.1.el6.x86_64 #1
17:31:39:Call Trace:
17:31:39: [<ffffffff814fd24a>] ? panic+0xa0/0x168
17:31:39: [<ffffffffa0451f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
17:31:39: [<ffffffffa09540cf>] ? osc_lock_unuse+0x1bf/0x240 [osc]
17:31:39: [<ffffffffa0658035>] ? cl_unuse_try_internal+0x55/0x100 [obdclass]
17:31:40: [<ffffffffa065b9e9>] ? cl_unuse_try+0x199/0x340 [obdclass]
17:31:40: [<ffffffffa0956a21>] ? osc_lock_upcall+0x171/0x610 [osc]
17:31:40: [<ffffffffa09568b0>] ? osc_lock_upcall+0x0/0x610 [osc]
17:31:41: [<ffffffffa093815e>] ? osc_enqueue_fini+0xfe/0x240 [osc]
17:31:41: [<ffffffffa093cd82>] ? osc_enqueue_interpret+0xe2/0x1f0 [osc]
17:31:41: [<ffffffffa079ac2f>] ? ptlrpc_check_set+0x29f/0x1ae0 [ptlrpc]
17:31:41: [<ffffffffa07cd53b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc]
17:31:41: [<ffffffffa07cda6b>] ? ptlrpcd+0x22b/0x3a0 [ptlrpc]
17:31:41: [<ffffffff81060250>] ? default_wake_function+0x0/0x20
17:31:41: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
17:31:41: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
17:31:41: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
17:31:41: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc]
17:31:42: [<ffffffff8100c140>] ? child_rip+0x0/0x20
17:31:42:Initializing cgroup subsys cpuset


 Comments   
Comment by Peter Jones [ 02/Sep/12 ]

Is this the same issue as LU-1772?

Comment by Jinshan Xiong (Inactive) [ 04/Sep/12 ]

No, it's not. This is a race between lock upcall and lock cancel from llite.

Comment by Peter Jones [ 04/Sep/12 ]

Lai

Could you please look into this one?

Thanks

Peter

Comment by Jinshan Xiong (Inactive) [ 06/Sep/12 ]

I can take a look at this problem.

Comment by Jinshan Xiong (Inactive) [ 06/Sep/12 ]

A patch is at: http://review.whamcloud.com/3895

Comment by Peter Jones [ 10/Sep/12 ]

Jinshan worked on this

Comment by Peter Jones [ 10/Sep/12 ]

Landed for 2.3 and 2.4

Comment by ETHz Support (Inactive) [ 21/Sep/12 ]

this patch work for us.

Comment by Peter Jones [ 21/Sep/12 ]

Thanks for letting us know!

Generated at Sat Feb 10 01:19:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.