Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
CentOS 6.5 servers & clients, current master (tag 2.6.94).
-
3
-
17428
Description
Spawning multiple copies of diotest1 from LTP in the same directory causes the assertion from LU-3192 to reappear.
I was able to replicate with full debug enabled and will make the dump & KO files available momentarily. I'll also include the diotest1 binary, but note that it is unchanged from LTP.
Stack trace:
<0>LustreError: 7700:0:(osc_request.c:1219:osc_brw_prep_request()) ASSERTION( i == 0 || pg->off > pg_prev->off ) failed: i 3 p_c 10 pg ffffea00017a5208 [pri 0 ind 2771] off 16384 prev_pg ffffea00017a51d0 [pri 0 ind 2256] off 16384
<0>LustreError: 7700:0:(osc_request.c:1219:osc_brw_prep_request()) LBUG
<4>Pid: 7700, comm: diotest1
<4>
<4>Call Trace:
<4> [<ffffffffa0302895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
<4> [<ffffffffa0302e97>] lbug_with_loc+0x47/0xb0 [libcfs]
<4> [<ffffffffa0b14dd1>] osc_brw_prep_request+0xba1/0x10b0 [osc]
<4> [<ffffffffa0b15b40>] osc_build_rpc+0x860/0x15c0 [osc]
<4> [<ffffffffa0b30ab4>] osc_io_unplug0+0xe64/0x1b30 [osc]
<4> [<ffffffffa03131c1>] ? libcfs_debug_msg+0x41/0x50 [libcfs]
<4> [<ffffffffa0b33d21>] osc_io_unplug+0x11/0x20 [osc]
<4> [<ffffffffa0b38986>] osc_cache_writeback_range+0xda6/0x1280 [osc]
<4> [<ffffffffa0b25d30>] osc_io_fsync_start+0x90/0x360 [osc]
<4> [<ffffffffa047cb40>] ? cl_io_start+0x0/0x140 [obdclass]
<4> [<ffffffffa047cbaa>] cl_io_start+0x6a/0x140 [obdclass]
<4> [<ffffffffa047cb40>] ? cl_io_start+0x0/0x140 [obdclass]
<4> [<ffffffffa09790fe>] lov_io_call+0x8e/0x130 [lov]
<4> [<ffffffffa097cd8c>] lov_io_start+0xcc/0x180 [lov]
<4> [<ffffffffa047cbaa>] cl_io_start+0x6a/0x140 [obdclass]
<4> [<ffffffffa04808b4>] cl_io_loop+0xb4/0x1b0 [obdclass]
<4> [<ffffffffa09f283b>] cl_sync_file_range+0x31b/0x500 [lustre]
<4> [<ffffffffa0a1e9cc>] ll_writepages+0x9c/0x220 [lustre]
<4> [<ffffffff81134eb1>] do_writepages+0x21/0x40
<4> [<ffffffff8112031b>] __filemap_fdatawrite_range+0x5b/0x60
<4> [<ffffffff8112037a>] filemap_write_and_wait_range+0x5a/0x90
<4> [<ffffffff81121728>] generic_file_aio_read+0x418/0x700
<4> [<ffffffff81078fd7>] ? current_fs_time+0x27/0x30
<4> [<ffffffff811a5ef1>] ? touch_atime+0x71/0x1a0
<4> [<ffffffffa0a4f053>] vvp_io_read_start+0x233/0x460 [lustre]
<4> [<ffffffffa047cbaa>] cl_io_start+0x6a/0x140 [obdclass]
<4> [<ffffffffa04808b4>] cl_io_loop+0xb4/0x1b0 [obdclass]
<4> [<ffffffffa09efef1>] ll_file_io_generic+0x461/0xa40 [lustre]
<4> [<ffffffffa09f0600>] ll_file_aio_read+0x130/0x2b0 [lustre]
<4> [<ffffffffa09f0aa9>] ll_file_read+0x159/0x290 [lustre]
<4> [<ffffffff81189a75>] vfs_read+0xb5/0x1a0
<4> [<ffffffff81189bb1>] sys_read+0x51/0x90
<4> [<ffffffff810e202e>] ? __audit_syscall_exit+0x25e/0x290
<4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Given the fact that this problem has reoccurred, it seems sensible to add a test for this to the test suite.