[LU-1813] Test failure on test suite racer, subtest test_1, osc_lock_unuse()) ASSERTION( !ols->ols_hold ) failed Created: 31/Aug/12 Updated: 21/Sep/12 Resolved: 10/Sep/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.3.0 |
| Fix Version/s: | Lustre 2.3.0, Lustre 2.4.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 4301 |
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/7f3e7af0-f2c4-11e1-807d-52540035b04c. The sub-test test_1 failed with the following error:
17:16:19:Lustre: DEBUG MARKER: == racer test 1: racer on clients: client-23vm1,client-23vm2.lab.whamcloud.com DURATION=900 == 17:16:17 (1346199377) 17:16:19:Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u 17:16:21:Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre2/racer 17:16:21:Lustre: DEBUG MARKER: DURATION=900 /usr/lib64/lustre/tests/racer/racer.sh /mnt/lustre/racer 17:31:26:Lustre: DEBUG MARKER: lctl set_param -n fail_loc=0 2>/dev/null || true 17:31:29:Lustre: DEBUG MARKER: /usr/sbin/lctl mark == racer racer.sh test complete, duration 1591 sec == 17:31:28 \(1346200288\) 17:31:31:Lustre: DEBUG MARKER: == racer racer.sh test complete, duration 1591 sec == 17:31:28 (1346200288) 17:31:33:Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre' ' /proc/mounts); 17:31:33:if [ $running -ne 0 ] ; then 17:31:34:echo Stopping client $(hostname) /mnt/lustre opts:-f; 17:31:34:lsof /mnt/lustre || need_kill=no; 17:31:34:if [ x-f != x -a x$need_kill != xno ]; then 17:31:34: pids=$(lsof -t /mnt/lustre | sort -u); 17:31:34: 17:31:35:LustreError: 23870:0:(file.c:2328:ll_inode_revalidate_fini()) failure -116 inode 144115205255730212 17:31:35:LustreError: 23874:0:(file.c:2328:ll_inode_revalidate_fini()) failure -116 inode 144115205255730212 17:31:35:Lustre: setting import lustre-MDT0000_UUID INACTIVE by administrator request 17:31:35:LustreError: 23005:0:(file.c:155:ll_close_inode_openhandle()) inode 144115205255731283 mdc close failed: rc = -108 17:31:35:Lustre: setting import lustre-OST0000_UUID INACTIVE by administrator request 17:31:35:LustreError: 28247:0:(osc_lock.c:205:osc_lock_unuse()) ASSERTION( !ols->ols_hold ) failed: 17:31:36:LustreError: 28247:0:(osc_lock.c:205:osc_lock_unuse()) LBUG 17:31:36:Pid: 28247, comm: ptlrpcd_1 17:31:36: 17:31:36:Call Trace: 17:31:36: [<ffffffffa0451905>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] 17:31:36: [<ffffffffa0451f17>] lbug_with_loc+0x47/0xb0 [libcfs] 17:31:36: [<ffffffffa09540cf>] osc_lock_unuse+0x1bf/0x240 [osc] 17:31:36: [<ffffffffa0658035>] cl_unuse_try_internal+0x55/0x100 [obdclass] 17:31:36: [<ffffffffa065b9e9>] cl_unuse_try+0x199/0x340 [obdclass] 17:31:37: [<ffffffffa0956a21>] osc_lock_upcall+0x171/0x610 [osc] 17:31:37: [<ffffffffa09568b0>] ? osc_lock_upcall+0x0/0x610 [osc] 17:31:38: [<ffffffffa093815e>] osc_enqueue_fini+0xfe/0x240 [osc] 17:31:38: [<ffffffffa093cd82>] osc_enqueue_interpret+0xe2/0x1f0 [osc] 17:31:38: [<ffffffffa079ac2f>] ptlrpc_check_set+0x29f/0x1ae0 [ptlrpc] 17:31:38: [<ffffffffa07cd53b>] ptlrpcd_check+0x53b/0x560 [ptlrpc] 17:31:38: [<ffffffffa07cda6b>] ptlrpcd+0x22b/0x3a0 [ptlrpc] 17:31:38: [<ffffffff81060250>] ? default_wake_function+0x0/0x20 17:31:38: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] 17:31:38: [<ffffffff8100c14a>] child_rip+0xa/0x20 17:31:38: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] 17:31:38: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] 17:31:38: [<ffffffff8100c140>] ? child_rip+0x0/0x20 17:31:39: 17:31:39:Kernel panic - not syncing: LBUG 17:31:39:Pid: 28247, comm: ptlrpcd_1 Not tainted 2.6.32-279.5.1.el6.x86_64 #1 17:31:39:Call Trace: 17:31:39: [<ffffffff814fd24a>] ? panic+0xa0/0x168 17:31:39: [<ffffffffa0451f6b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] 17:31:39: [<ffffffffa09540cf>] ? osc_lock_unuse+0x1bf/0x240 [osc] 17:31:39: [<ffffffffa0658035>] ? cl_unuse_try_internal+0x55/0x100 [obdclass] 17:31:40: [<ffffffffa065b9e9>] ? cl_unuse_try+0x199/0x340 [obdclass] 17:31:40: [<ffffffffa0956a21>] ? osc_lock_upcall+0x171/0x610 [osc] 17:31:40: [<ffffffffa09568b0>] ? osc_lock_upcall+0x0/0x610 [osc] 17:31:41: [<ffffffffa093815e>] ? osc_enqueue_fini+0xfe/0x240 [osc] 17:31:41: [<ffffffffa093cd82>] ? osc_enqueue_interpret+0xe2/0x1f0 [osc] 17:31:41: [<ffffffffa079ac2f>] ? ptlrpc_check_set+0x29f/0x1ae0 [ptlrpc] 17:31:41: [<ffffffffa07cd53b>] ? ptlrpcd_check+0x53b/0x560 [ptlrpc] 17:31:41: [<ffffffffa07cda6b>] ? ptlrpcd+0x22b/0x3a0 [ptlrpc] 17:31:41: [<ffffffff81060250>] ? default_wake_function+0x0/0x20 17:31:41: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] 17:31:41: [<ffffffff8100c14a>] ? child_rip+0xa/0x20 17:31:41: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] 17:31:41: [<ffffffffa07cd840>] ? ptlrpcd+0x0/0x3a0 [ptlrpc] 17:31:42: [<ffffffff8100c140>] ? child_rip+0x0/0x20 17:31:42:Initializing cgroup subsys cpuset |
| Comments |
| Comment by Peter Jones [ 02/Sep/12 ] |
|
Is this the same issue as |
| Comment by Jinshan Xiong (Inactive) [ 04/Sep/12 ] |
|
No, it's not. This is a race between lock upcall and lock cancel from llite. |
| Comment by Peter Jones [ 04/Sep/12 ] |
|
Lai Could you please look into this one? Thanks Peter |
| Comment by Jinshan Xiong (Inactive) [ 06/Sep/12 ] |
|
I can take a look at this problem. |
| Comment by Jinshan Xiong (Inactive) [ 06/Sep/12 ] |
|
A patch is at: http://review.whamcloud.com/3895 |
| Comment by Peter Jones [ 10/Sep/12 ] |
|
Jinshan worked on this |
| Comment by Peter Jones [ 10/Sep/12 ] |
|
Landed for 2.3 and 2.4 |
| Comment by ETHz Support (Inactive) [ 21/Sep/12 ] |
|
this patch work for us. |
| Comment by Peter Jones [ 21/Sep/12 ] |
|
Thanks for letting us know! |