[LU-4443] Failure on test suite parallel-scale test_write_disjoint: ASSERTION( ols->ols_state == OLS_NEW ) failed Created: 06/Jan/14  Updated: 14/Jul/15  Resolved: 05/Mar/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Maloo Assignee: Zhenyu Xu
Resolution: Duplicate Votes: 0
Labels: None
Environment:

client and server: lustre-master build # 1823 RHEL6 ldiskfs


Issue Links:
Duplicate
duplicates LU-4692 (osc_lock.c:1204:osc_lock_enqueue()) ... Resolved
Severity: 3
Rank (Obsolete): 12188

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/f9a26d34-74e0-11e3-96b0-52540035b04c.

The sub-test test_write_disjoint failed with the following error:

test failed to respond and timed out

client 1 console:

07:13:31:Lustre: DEBUG MARKER: == parallel-scale test write_disjoint: write_disjoint == 07:03:00 (1388761380)
07:13:31:LustreError: 24289:0:(osc_lock.c:1204:osc_lock_enqueue()) ASSERTION( ols->ols_state == OLS_NEW ) failed: Impossible state: 6
07:13:31:LustreError: 24289:0:(osc_lock.c:1204:osc_lock_enqueue()) LBUG
07:13:31:Pid: 24289, comm: write_disjoint
07:13:31:
07:13:31:Call Trace:
07:13:31: [<ffffffffa0bb4895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
07:13:31: [<ffffffffa0bb4e97>] lbug_with_loc+0x47/0xb0 [libcfs]
07:13:31: [<ffffffffa1112310>] ? osc_lock_enqueue+0x0/0x890 [osc]
07:13:31: [<ffffffffa1112a88>] osc_lock_enqueue+0x778/0x890 [osc]
07:13:31: [<ffffffffa0d19327>] ? cl_lock_state_signal+0x87/0x160 [obdclass]
07:13:31: [<ffffffffa0d1ccec>] cl_enqueue_try+0xfc/0x300 [obdclass]
07:13:31: [<ffffffffa11803fa>] lov_lock_enqueue+0x22a/0x850 [lov]
07:13:31: [<ffffffffa0d1ccec>] cl_enqueue_try+0xfc/0x300 [obdclass]
07:13:31: [<ffffffffa0d1df3f>] cl_enqueue_locked+0x6f/0x1f0 [obdclass]
07:13:31: [<ffffffffa0d1eb8e>] cl_lock_request+0x7e/0x270 [obdclass]
07:13:31: [<ffffffffa0d23b1c>] cl_io_lock+0x3cc/0x560 [obdclass]
07:13:31: [<ffffffffa0d23d52>] cl_io_loop+0xa2/0x1b0 [obdclass]
07:13:31: [<ffffffffa1220456>] ll_file_io_generic+0x2b6/0x710 [lustre]
07:13:31: [<ffffffffa0d13b39>] ? cl_env_get+0x29/0x350 [obdclass]
07:13:31: [<ffffffffa1221122>] ll_file_aio_write+0x142/0x2c0 [lustre]
07:13:31: [<ffffffffa122140c>] ll_file_write+0x16c/0x2a0 [lustre]
07:13:31: [<ffffffff81181398>] vfs_write+0xb8/0x1a0
07:13:31: [<ffffffff81181c91>] sys_write+0x51/0x90
07:13:31: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
07:13:31: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
07:13:31:
07:13:31:Kernel panic - not syncing: LBUG
07:13:31:Pid: 24289, comm: write_disjoint Not tainted 2.6.32-358.23.2.el6.x86_64 #1
07:13:31:Call Trace:
07:13:31: [<ffffffff8150daac>] ? panic+0xa7/0x16f
07:13:31: [<ffffffffa0bb4eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs]
07:13:31: [<ffffffffa1112310>] ? osc_lock_enqueue+0x0/0x890 [osc]
07:13:31: [<ffffffffa1112a88>] ? osc_lock_enqueue+0x778/0x890 [osc]
07:13:31: [<ffffffffa0d19327>] ? cl_lock_state_signal+0x87/0x160 [obdclass]
07:13:31: [<ffffffffa0d1ccec>] ? cl_enqueue_try+0xfc/0x300 [obdclass]
07:13:31: [<ffffffffa11803fa>] ? lov_lock_enqueue+0x22a/0x850 [lov]
07:13:31: [<ffffffffa0d1ccec>] ? cl_enqueue_try+0xfc/0x300 [obdclass]
07:13:31: [<ffffffffa0d1df3f>] ? cl_enqueue_locked+0x6f/0x1f0 [obdclass]
07:13:31: [<ffffffffa0d1eb8e>] ? cl_lock_request+0x7e/0x270 [obdclass]
07:13:31: [<ffffffffa0d23b1c>] ? cl_io_lock+0x3cc/0x560 [obdclass]
07:13:31: [<ffffffffa0d23d52>] ? cl_io_loop+0xa2/0x1b0 [obdclass]
07:13:31: [<ffffffffa1220456>] ? ll_file_io_generic+0x2b6/0x710 [lustre]
07:13:31: [<ffffffffa0d13b39>] ? cl_env_get+0x29/0x350 [obdclass]
07:13:31: [<ffffffffa1221122>] ? ll_file_aio_write+0x142/0x2c0 [lustre]
07:13:31: [<ffffffffa122140c>] ? ll_file_write+0x16c/0x2a0 [lustre]
07:13:31: [<ffffffff81181398>] ? vfs_write+0xb8/0x1a0
07:13:31: [<ffffffff81181c91>] ? sys_write+0x51/0x90
07:13:32: [<ffffffff810dc685>] ? __audit_syscall_exit+0x265/0x290
07:13:32: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
07:13:32:Initializing cgroup subsys cpuset


 Comments   
Comment by Peter Jones [ 09/Jan/14 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Zhenyu Xu [ 10/Jan/14 ]

Jinshan,

I think this issue relates to LU-3889, which avoid unuse-ing lock in ENQUEUED state when it is being canceled, so that the osc lock state remains to be OLS_CANCELLED when it is picked up to requeue. What do you think?

Comment by Zhenyu Xu [ 10/Jan/14 ]

emmm, this test build is upon ef1d121c2e68e368dae72a5994d3d1fd3b35c2b3 which has not contained the LU-3889 patch yet, so it does not relate to LU-3889.

Generated at Sat Feb 10 01:42:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.