Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
3
-
7788
Description
This happens to Lustre 2.3 running with SP1 but we have seen it in SP2 or 2.2(or even 2.1 with different stack trace, see LELUS-41):
LustreError: 3821:0:(osc_request.c:819:osc_announce_cached()) dirty 361 - 362 > system dirty_max 2113536
LustreError: 3818:0:(osc_request.c:1308:osc_brw_prep_request()) ASSERTION( i == 0 || pg->off > pg_prev->off ) failed: i 1 p_c 151 pg ffffea0004129ce8 [pri 18446612138517887360 ind 30208] off 123731968 prev_pg ffffea000dbc0e18 [pri 0
ind 263316] off 123731968
LustreError: 3818:0:(osc_request.c:1308:osc_brw_prep_request()) LBUG
Pid: 3818, comm: doio_mpi
Call Trace:
[<ffffffff81007e59>] try_stack_unwind+0x1a9/0x200
[<ffffffff81006625>] dump_trace+0x95/0x300
[<ffffffffa016b8d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs]
[<ffffffffa016be27>] lbug_with_loc+0x47/0xb0 [libcfs]
[<ffffffffa06969c9>] osc_brw_prep_request+0x959/0xe60 [osc]
[<ffffffffa06983bc>] osc_build_rpc+0x90c/0x1180 [osc]
[<ffffffffa06ad617>] osc_send_oap_rpc+0x3c7/0xc20 [osc]
[<ffffffffa06ae24f>] osc_io_unplug+0x3df/0x730 [osc]
[<ffffffffa06a834a>] osc_io_submit+0x1da/0x520 [osc]
[<ffffffffa02e13e8>] cl_io_submit_rw+0x78/0x190 [obdclass]
[<ffffffffa07334cd>] lov_io_submit+0x27d/0xc00 [lov]
[<ffffffffa02e13e8>] cl_io_submit_rw+0x78/0x190 [obdclass]
[<ffffffffa02e35c1>] cl_io_read_page+0xd1/0x190 [obdclass]
[<ffffffffa07ee254>] ll_readpage+0x184/0x210 [lustre]
[<ffffffff810d3ff0>] generic_file_aio_read+0x230/0x640
[<ffffffffa0819846>] vvp_io_read_start+0x1f6/0x3d0 [lustre]
[<ffffffffa02e16ca>] cl_io_start+0x6a/0x130 [obdclass]
[<ffffffffa02e597c>] cl_io_loop+0xac/0x1a0 [obdclass]
[<ffffffffa07c5c23>] ll_file_io_generic+0x353/0x530 [lustre]
[<ffffffffa07c62c8>] ll_file_aio_read+0x238/0x290 [lustre]
[<ffffffffa07c69bf>] ll_file_read+0x20f/0x2b0 [lustre]
[<ffffffff81117fe8>] vfs_read+0xc8/0x1a0
[<ffffffff811181b5>] sys_read+0x55/0x90
[<ffffffff8100305b>] system_call_fastpath+0x16/0x1b
[<0000000020139f90>] 0x20139f90
Kernel panic - not syncing: LBUG
Here are also some other instances all seem to hit the same page for the RPC:
Lustre 2.3.0-trunk-1.0000.40706.58.4-abuild-trunk 27
First hit Cname # hits Apid/Roles
LUS: LBUG-ASSERTION( i == 0 || pg->off > pg_prev->off ) failed: i 1 p_c 256 pg ffffea000c65a268 [pri 18446612137055183040 ind 26880] off 110100480 prev_pg ffffea000e0e73f0 [pri 0 ind 263338] off 110100480
13/02/06 01:43:15 c0-0c0s1n2 1 854758
854758 doio_mpi
LUS: LBUG-ASSERTION( i == 0 || pg->off > pg_prev->off ) failed: i 1 p_c 256 pg ffffea00179fdfa0 [pri 18446612160482967680 ind 46909] off 192139264 prev_pg ffffea000bd34598 [pri 0 ind 263425] off 192139264
13/02/06 02:22:40 c0-0c0s4n3 1 855115
855115 doio_mpi
LUS: LBUG-ASSERTION( i == 0 || pg->off > pg_prev->off ) failed: i 32 p_c 155 pg ffffea001bcb95e0 [pri 0 ind 264145] off 14807040 prev_pg ffffea001bff2bd0 [pri 18446612166806628160 ind 3615] off 14807040
13/02/06 00:09:44 c0-0c0s7n2 1 854062
854062 doio_mpi
Attachments
Issue Links
- is related to
-
LU-247 Lustre client slow performance on BG/P IONs: unaligned DIRECT_IO
- Resolved