Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.6.0
-
SLES11SP3 clients, master (v2_5_58_0-105-gb7f1952), Cray stress testing.
-
3
-
13981
Description
Cray has recently begun stress testing master clients in preparation for 2.6.0.
We hit this bug on several nodes during a stress run last night:
2014-05-13T23:00:06.485598-05:00 c1-0c0s2n1 LustreError: 14298:0:(osc_io.c:824:osc_req_attr_set()) page@ffff880221781c00[4 ffff880222618600 3 0 1 (null) ffff88022328d740 0x0] 2014-05-13T23:00:06.485630-05:00 c1-0c0s2n1 LustreError: 14298:0:(osc_io.c:824:osc_req_attr_set()) vvp-page@ffff880221781ca0(0:0:0) vm@ffffea000e1502a0 200000000000801 4:0 ffff880221781c00 0 lru 2014-05-13T23:00:06.485637-05:00 c1-0c0s2n1 LustreError: 14298:0:(osc_io.c:824:osc_req_attr_set()) lov-page@ffff880221781cf8, raid0 2014-05-13T23:00:06.485672-05:00 c1-0c0s2n1 LustreError: 14298:0:(osc_io.c:824:osc_req_attr_set()) osc-page@ffff880221781d60 0: 1< 0x845fed 1 0 + + > 2< 0 0 4096 0x7 0x8 | (null) ffff88044ad844f8 ffff88044d7a59e8 > 3< + ffff88044e055080 1 16 0 > 4< 0 0 8 35586048 - | - - - + > 5< - - - + | 0 - | 0 - -> 2014-05-13T23:00:06.485681-05:00 c1-0c0s2n1 LustreError: 14298:0:(osc_io.c:824:osc_req_attr_set()) end page@ffff880221781c00 2014-05-13T23:00:06.485689-05:00 c1-0c0s2n1 LustreError: 14298:0:(osc_io.c:824:osc_req_attr_set()) dump uncover page! 2014-05-13T23:00:06.485700-05:00 c1-0c0s2n1 Pid: 14298, comm: nsystst 2014-05-13T23:00:06.485710-05:00 c1-0c0s2n1 Call Trace: 2014-05-13T23:00:06.485724-05:00 c1-0c0s2n1 [<ffffffff81005eb9>] try_stack_unwind+0x169/0x1b0 2014-05-13T23:00:06.485746-05:00 c1-0c0s2n1 [<ffffffff81004919>] dump_trace+0x89/0x450 2014-05-13T23:00:06.485757-05:00 c1-0c0s2n1 [<ffffffffa023b8d7>] libcfs_debug_dumpstack+0x57/0x80 [libcfs] 2014-05-13T23:00:06.485784-05:00 c1-0c0s2n1 [<ffffffffa092f348>] osc_req_attr_set+0x6b8/0x740 [osc] 2014-05-13T23:00:06.485796-05:00 c1-0c0s2n1 [<ffffffffa038dbf1>] cl_req_attr_set+0xd1/0x220 [obdclass] 2014-05-13T23:00:06.485807-05:00 c1-0c0s2n1 [<ffffffffa091a6da>] osc_build_rpc+0x4fa/0x15d0 [osc] 2014-05-13T23:00:06.485817-05:00 c1-0c0s2n1 [<ffffffffa093c199>] osc_io_unplug0+0x1419/0x2020 [osc] 2014-05-13T23:00:06.485837-05:00 c1-0c0s2n1 [<ffffffffa093ddb1>] osc_io_unplug+0x11/0x20 [osc] 2014-05-13T23:00:06.485844-05:00 c1-0c0s2n1 [<ffffffffa093dfa1>] osc_queue_sync_pages+0x1e1/0x380 [osc] 2014-05-13T23:00:06.485852-05:00 c1-0c0s2n1 [<ffffffffa092f90d>] osc_io_submit+0x2dd/0x4e0 [osc] 2014-05-13T23:00:06.485871-05:00 c1-0c0s2n1 [<ffffffffa038d40e>] cl_io_submit_rw+0x6e/0x170 [obdclass] 2014-05-13T23:00:06.485881-05:00 c1-0c0s2n1 [<ffffffffa076123e>] lov_io_submit+0x2ee/0x4d0 [lov] 2014-05-13T23:00:06.485894-05:00 c1-0c0s2n1 [<ffffffffa038d40e>] cl_io_submit_rw+0x6e/0x170 [obdclass] 2014-05-13T23:00:06.485900-05:00 c1-0c0s2n1 [<ffffffffa038f3fa>] cl_io_read_page+0x19a/0x1b0 [obdclass] 2014-05-13T23:00:06.485907-05:00 c1-0c0s2n1 [<ffffffffa081bf7c>] ll_readpage+0x16c/0x1b0 [lustre] 2014-05-13T23:00:06.485920-05:00 c1-0c0s2n1 [<ffffffff810fdc68>] generic_file_aio_read+0x268/0x740 2014-05-13T23:00:06.485932-05:00 c1-0c0s2n1 [<ffffffffa084e0b0>] vvp_io_read_start+0x2a0/0x430 [lustre] 2014-05-13T23:00:06.485938-05:00 c1-0c0s2n1 [<ffffffffa038d6b2>] cl_io_start+0x72/0x140 [obdclass] 2014-05-13T23:00:06.485947-05:00 c1-0c0s2n1 [<ffffffffa0391234>] cl_io_loop+0xb4/0x1b0 [obdclass] 2014-05-13T23:00:06.485959-05:00 c1-0c0s2n1 [<ffffffffa07e9ccc>] ll_file_io_generic+0x58c/0x8a0 [lustre] 2014-05-13T23:00:06.485967-05:00 c1-0c0s2n1 [<ffffffffa07ea49c>] ll_file_aio_read+0x22c/0x290 [lustre] 2014-05-13T23:00:06.485994-05:00 c1-0c0s2n1 [<ffffffff8115517b>] do_sync_readv_writev+0xdb/0x120 2014-05-13T23:00:06.486000-05:00 c1-0c0s2n1 [<ffffffff81156084>] do_readv_writev+0xd4/0x1e0 2014-05-13T23:00:06.486010-05:00 c1-0c0s2n1 [<ffffffff811563c5>] vfs_readv+0x45/0x60 2014-05-13T23:00:06.486016-05:00 c1-0c0s2n1 [<ffffffff81156505>] sys_readv+0x55/0xc0 2014-05-13T23:00:06.486023-05:00 c1-0c0s2n1 [<ffffffff8142286b>] system_call_fastpath+0x16/0x1b 2014-05-13T23:00:06.486031-05:00 c1-0c0s2n1 [<000000002006d41b>] 0x2006d41b 2014-05-13T23:00:06.510785-05:00 c1-0c0s2n1 LustreError: 14298:0:(osc_io.c:826:osc_req_attr_set()) LBUG
I'll make the dump+logs available shortly.
I note this bug was seen by Oleg during patch review for the readahead changes in LU-3321, see his comments here:
http://review.whamcloud.com/#/c/8523/
I also note a previous instance of this assertion was fixed in LU-1650:
https://jira.hpdd.intel.com/browse/LU-1650
Attachments
Issue Links
- is related to
-
LU-5685 (cl_lock.c:1128:cl_use_try()) ASSERTION( result != -38 )
- Closed