[LU-2099] osd_init_iobuf()) ASSERTION( iobuf->dr_elapsed_valid == 0 ) sanity test 63a Created: 07/Oct/12 Updated: 14/May/14 Resolved: 14/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.2 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Oleg Drokin | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
single node, 4 cpu cores. lustre files in /tmp that is tmpfs |
||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 4386 | ||||
| Description |
|
Just hit this when testing with USE_OFD=yes [ 929.726136] Lustre: DEBUG MARKER: == sanity test 63a: Verify oig_wait interruption does not crash ========= 03:34:59 (1349595299) [ 964.438007] LustreError: 28767:0:(osd_io.c:107:osd_init_iobuf()) ASSERTION( iobuf->dr_elapsed_valid == 0 ) failed: [ 964.439525] LustreError: 28767:0:(osd_io.c:107:osd_init_iobuf()) LBUG [ 964.441022] Pid: 28767, comm: ll_ost_io00_004 [ 964.442579] [ 964.442580] Call Trace: [ 964.445464] [<ffffffffa03f4915>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] [ 964.446900] [<ffffffffa03f4f27>] lbug_with_loc+0x47/0xb0 [libcfs] [ 964.448333] [<ffffffffa0c41a82>] osd_init_iobuf+0xd2/0xe0 [osd_ldiskfs] [ 964.449770] [<ffffffffa0c44b0b>] osd_write_prep+0x17b/0x420 [osd_ldiskfs] [ 964.451208] [<ffffffffa0c43976>] ? osd_bufs_get+0x206/0x390 [osd_ldiskfs] [ 964.452627] [<ffffffffa0e26daa>] ofd_preprw+0x7da/0x11c0 [ofd] [ 964.454077] [<ffffffffa0781cf4>] ? sptlrpc_svc_alloc_rs+0x74/0x2d0 [ptlrpc] [ 964.455474] [<ffffffffa0d557fc>] obd_preprw+0x12c/0x3d0 [ost] [ 964.456890] [<ffffffffa0d5c932>] ost_brw_write+0x882/0x15f0 [ost] [ 964.458284] [<ffffffff8127a646>] ? vsnprintf+0x2b6/0x5f0 [ 964.459696] [<ffffffffa0754cbc>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc] [ 964.461128] [<ffffffffa0754e18>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc] [ 964.462513] [<ffffffffa0d621f0>] ost_handle+0x3120/0x4550 [ost] [ 964.463899] [<ffffffffa0401464>] ? libcfs_id2str+0x74/0xb0 [libcfs] [ 964.465329] [<ffffffffa0762883>] ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc] [ 964.466745] [<ffffffffa03f566e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [ 964.468163] [<ffffffffa075b571>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [ 964.469568] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [ 964.470993] [<ffffffffa076541a>] ptlrpc_main+0xb9a/0x1960 [ptlrpc] [ 964.472399] [<ffffffffa0764880>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 964.473806] [<ffffffff8100c14a>] child_rip+0xa/0x20 [ 964.475201] [<ffffffffa0764880>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 964.476610] [<ffffffffa0764880>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 964.477937] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [ 964.479240] [ 964.481682] Kernel panic - not syncing: LBUG [ 964.482986] Pid: 28767, comm: ll_ost_io00_004 Not tainted 2.6.32-debug #6 [ 964.484296] Call Trace: [ 964.485627] [<ffffffff814f75e4>] ? panic+0xa0/0x168 [ 964.486964] [<ffffffffa03f4f7b>] ? lbug_with_loc+0x9b/0xb0 [libcfs] [ 964.488261] [<ffffffffa0c41a82>] ? osd_init_iobuf+0xd2/0xe0 [osd_ldiskfs] [ 964.489572] [<ffffffffa0c44b0b>] ? osd_write_prep+0x17b/0x420 [osd_ldiskfs] [ 964.490866] [<ffffffffa0c43976>] ? osd_bufs_get+0x206/0x390 [osd_ldiskfs] [ 964.492151] [<ffffffffa0e26daa>] ? ofd_preprw+0x7da/0x11c0 [ofd] [ 964.493512] [<ffffffffa0781cf4>] ? sptlrpc_svc_alloc_rs+0x74/0x2d0 [ptlrpc] [ 964.494817] [<ffffffffa0d557fc>] ? obd_preprw+0x12c/0x3d0 [ost] [ 964.496116] [<ffffffffa0d5c932>] ? ost_brw_write+0x882/0x15f0 [ost] [ 964.497432] [<ffffffff8127a646>] ? vsnprintf+0x2b6/0x5f0 [ 964.498774] [<ffffffffa0754cbc>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc] [ 964.500109] [<ffffffffa0754e18>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc] [ 964.501424] [<ffffffffa0d621f0>] ? ost_handle+0x3120/0x4550 [ost] [ 964.502716] [<ffffffffa0401464>] ? libcfs_id2str+0x74/0xb0 [libcfs] [ 964.504014] [<ffffffffa0762883>] ? ptlrpc_server_handle_request+0x463/0xe70 [ptlrpc] [ 964.505319] [<ffffffffa03f566e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [ 964.506593] [<ffffffffa075b571>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [ 964.507817] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [ 964.509104] [<ffffffffa076541a>] ? ptlrpc_main+0xb9a/0x1960 [ptlrpc] [ 964.510345] [<ffffffffa0764880>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 964.511510] [<ffffffff8100c14a>] ? child_rip+0xa/0x20 [ 964.512795] [<ffffffffa0764880>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 964.513993] [<ffffffffa0764880>] ? ptlrpc_main+0x0/0x1960 [ptlrpc] [ 964.515105] [<ffffffff8100c140>] ? child_rip+0x0/0x20 I have a crashdump for this instance too in case you need something from there. |
| Comments |
| Comment by Oleg Drokin [ 07/Oct/12 ] |
|
Seems to be mostly the sane as ORI-258 |
| Comment by Oleg Drokin [ 07/Oct/12 ] |
|
I just hit this one two more times on two different nodes, so this one is certainly a very real problem. |
| Comment by Alex Zhuravlev [ 08/Oct/12 ] |
|
do you have crash dumps for the case? |
| Comment by Alex Zhuravlev [ 08/Oct/12 ] |
| Comment by Oleg Drokin [ 08/Oct/12 ] |
|
I've yet to hit it again with the patch applied, though there was one strange problem hit in the same path once. |
| Comment by Alex Zhuravlev [ 10/Oct/12 ] |
|
from what I heard last time, no more hits happened with the patch. |
| Comment by Jinshan Xiong (Inactive) [ 18/Apr/14 ] |
|
Seeing this issue again on latest master. loop: Write error at byte offset 243236864, length 4096. Call Trace: |
| Comment by Jinshan Xiong (Inactive) [ 07/May/14 ] |
|
I can't continue my testing since I'm keeping seeing this issue on 2.5 and 2.6 |
| Comment by Alex Zhuravlev [ 12/May/14 ] |
|
could you grab full debug log for the case please? with the following patch: — a/lustre/osd-ldiskfs/osd_io.c
if (unlikely(rc != 0)) { |
| Comment by Jinshan Xiong (Inactive) [ 12/May/14 ] |
|
Log and dmesg. I have a crash dump please let me know if you want it |
| Comment by Alex Zhuravlev [ 13/May/14 ] |
|
please, try with http://review.whamcloud.com/#/c/10305/ |
| Comment by Jinshan Xiong (Inactive) [ 13/May/14 ] |
|
it looks good in the initial testing |
| Comment by Peter Jones [ 14/May/14 ] |
|
Landed for 2.6 |