[LU-6983] LBUG on osc_extent_find() ASSERTION( (max_end - cur->oe_start) < max_pages ) failed: [35840 -> 511/511] Created: 11/Aug/15 Updated: 08/Feb/18 Resolved: 08/Feb/18 |
|
| Status: | Closed |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Antoine Percher | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL7 lustre client with 2.5.3 lustre server |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
LBUG on osc_extent_find() ASSERTION( (max_end - cur->oe_start) < max_pages ) failed: [35840 -> 511/511] As the [794894.288763] Lustre: store0-OST0045-osc-ffff88201fcb5800: Connection restored to store0-OST0045 (at QQ.P.BBO.FB@o2ib2) [794896.511870] Lustre: store0-OST01f3-osc-ffff88201fcb5800: Connection restored to store0-OST01f3 (at QQ.P.BBO.II@o2ib2) ... [794898.170269] LustreError: 40201:0:(osc_cache.c:662:osc_extent_find()) ASSERTION( (max_end - cur->oe_start) < max_pages ) failed: [35840 -> 511/511] [794898.170280] LustreError: 40201:0:(osc_cache.c:662:osc_extent_find()) LBUG [794898.170287] Pid: 40201, comm: testsApiC++-gcc [794898.170287] and the stack of the Lbug thread was crash> bt
PID: 40201 TASK: ffff880e6f474440 CPU: 6 COMMAND: "testsApiC++-gcc"
#0 [ffff880eeff93638] machine_kexec at ffffffff8104c4cb
#1 [ffff880eeff93698] crash_kexec at ffffffff810e1fe2
#2 [ffff880eeff93768] panic at ffffffff815fd7e1
#3 [ffff880eeff937e8] lbug_with_loc at ffffffffa0473e5b [libcfs]
#4 [ffff880eeff93808] osc_extent_find at ffffffffa0becdf2 [osc]
#5 [ffff880eeff93990] osc_queue_async_io at ffffffffa0be4bf0 [osc]
#6 [ffff880eeff93ad8] osc_page_cache_add at ffffffffa0bd2463 [osc]
#7 [ffff880eeff93b00] osc_io_commit_async at ffffffffa0bd9162 [osc]
#8 [ffff880eeff93b60] cl_io_commit_async at ffffffffa06f4007 [obdclass]
#9 [ffff880eeff93ba8] lov_io_commit_async at ffffffffa09ecbea [lov]
#10 [ffff880eeff93c08] cl_io_commit_async at ffffffffa06f4007 [obdclass]
#11 [ffff880eeff93c50] vvp_io_write_commit at ffffffffa0b0007a [lustre]
#12 [ffff880eeff93cb0] vvp_io_write_start at ffffffffa0b00aa6 [lustre]
#13 [ffff880eeff93d00] cl_io_start at ffffffffa06f3875 [obdclass]
#14 [ffff880eeff93d28] cl_io_loop at ffffffffa06f6c95 [obdclass]
#15 [ffff880eeff93d58] ll_file_io_generic at ffffffffa0a9f85c [lustre]
#16 [ffff880eeff93e60] ll_file_aio_write at ffffffffa0aa00ce [lustre]
#17 [ffff880eeff93ea8] ll_file_write at ffffffffa0aa02b2 [lustre]
#18 [ffff880eeff93ef8] vfs_write at ffffffff811c65dd
#19 [ffff880eeff93f38] sys_write at ffffffff811c7028
#20 [ffff880eeff93f80] system_call_fastpath at ffffffff81613da9
RIP: 00007f8d6bbc39fd RSP: 00007fff791cd238 RFLAGS: 00010216
RAX: 0000000000000001 RBX: ffffffff81613da9 RCX: 000000000000003f
RDX: 0000000005c00000 RSI: 00007f8bce395038 RDI: 0000000000000020
RBP: 00007f8bce395038 R8: 00000000003ffffe R9: 00000000003ffff4
R10: 00000000003ffff5 R11: 0000000000000293 R12: 0000000005c00000
R13: 0000000005c00000 R14: 0000000006f656c0 R15: 0000000005c00000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
in this case a lot of user thread application do the same LBUG at the same time Question: is the |
| Comments |
| Comment by Joseph Gmitter (Inactive) [ 11/Aug/15 ] |
|
Jinshan, |
| Comment by Jinshan Xiong (Inactive) [ 11/Aug/15 ] |
|
if you still have an alive vmcore, please dump the information of the client_obd in question. |
| Comment by Antoine Percher [ 12/Aug/15 ] |
|
Hi Jinshan, |
| Comment by Antoine Percher [ 21/Sep/15 ] |
|
Hi Jinshan, crash> p ((struct osc_lock *)0xffff880036883648).ols_cl.cls_lock.cll_descr
$11 = {
cld_obj = 0xffff880ec1f66798,
cld_start = 0x0,
cld_end = 0x1ff,
cld_gid = 0x0,
cld_mode = CLM_WRITE,
cld_enq_flags = 0x0
}
These datas didn't fit with the IOs in progress and explain the 511 (0x1ff) from the LBUG message : |
| Comment by Antoine Percher [ 21/Sep/15 ] |
|
Add attachment file |