Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 2.7.0
-
None
-
RHEL7 lustre client with 2.5.3 lustre server
-
3
-
9223372036854775807
Description
LBUG on osc_extent_find() ASSERTION( (max_end - cur->oe_start) < max_pages ) failed: [35840 -> 511/511]
As the LU-6271 after some OST eviction and reconnection during an eavy I/O load
the client do an LBUG like this :
[794894.288763] Lustre: store0-OST0045-osc-ffff88201fcb5800: Connection restored to store0-OST0045 (at QQ.P.BBO.FB@o2ib2) [794896.511870] Lustre: store0-OST01f3-osc-ffff88201fcb5800: Connection restored to store0-OST01f3 (at QQ.P.BBO.II@o2ib2) ... [794898.170269] LustreError: 40201:0:(osc_cache.c:662:osc_extent_find()) ASSERTION( (max_end - cur->oe_start) < max_pages ) failed: [35840 -> 511/511] [794898.170280] LustreError: 40201:0:(osc_cache.c:662:osc_extent_find()) LBUG [794898.170287] Pid: 40201, comm: testsApiC++-gcc [794898.170287]
and the stack of the Lbug thread was
crash> bt
PID: 40201 TASK: ffff880e6f474440 CPU: 6 COMMAND: "testsApiC++-gcc"
#0 [ffff880eeff93638] machine_kexec at ffffffff8104c4cb
#1 [ffff880eeff93698] crash_kexec at ffffffff810e1fe2
#2 [ffff880eeff93768] panic at ffffffff815fd7e1
#3 [ffff880eeff937e8] lbug_with_loc at ffffffffa0473e5b [libcfs]
#4 [ffff880eeff93808] osc_extent_find at ffffffffa0becdf2 [osc]
#5 [ffff880eeff93990] osc_queue_async_io at ffffffffa0be4bf0 [osc]
#6 [ffff880eeff93ad8] osc_page_cache_add at ffffffffa0bd2463 [osc]
#7 [ffff880eeff93b00] osc_io_commit_async at ffffffffa0bd9162 [osc]
#8 [ffff880eeff93b60] cl_io_commit_async at ffffffffa06f4007 [obdclass]
#9 [ffff880eeff93ba8] lov_io_commit_async at ffffffffa09ecbea [lov]
#10 [ffff880eeff93c08] cl_io_commit_async at ffffffffa06f4007 [obdclass]
#11 [ffff880eeff93c50] vvp_io_write_commit at ffffffffa0b0007a [lustre]
#12 [ffff880eeff93cb0] vvp_io_write_start at ffffffffa0b00aa6 [lustre]
#13 [ffff880eeff93d00] cl_io_start at ffffffffa06f3875 [obdclass]
#14 [ffff880eeff93d28] cl_io_loop at ffffffffa06f6c95 [obdclass]
#15 [ffff880eeff93d58] ll_file_io_generic at ffffffffa0a9f85c [lustre]
#16 [ffff880eeff93e60] ll_file_aio_write at ffffffffa0aa00ce [lustre]
#17 [ffff880eeff93ea8] ll_file_write at ffffffffa0aa02b2 [lustre]
#18 [ffff880eeff93ef8] vfs_write at ffffffff811c65dd
#19 [ffff880eeff93f38] sys_write at ffffffff811c7028
#20 [ffff880eeff93f80] system_call_fastpath at ffffffff81613da9
RIP: 00007f8d6bbc39fd RSP: 00007fff791cd238 RFLAGS: 00010216
RAX: 0000000000000001 RBX: ffffffff81613da9 RCX: 000000000000003f
RDX: 0000000005c00000 RSI: 00007f8bce395038 RDI: 0000000000000020
RBP: 00007f8bce395038 R8: 00000000003ffffe R9: 00000000003ffff4
R10: 00000000003ffff5 R11: 0000000000000293 R12: 0000000005c00000
R13: 0000000005c00000 R14: 0000000006f656c0 R15: 0000000005c00000
ORIG_RAX: 0000000000000001 CS: 0033 SS: 002b
in this case a lot of user thread application do the same LBUG at the same time
Question: is the LU-6271 fix (http://review.whamcloud.com/#/c/14915/) could help for this issue ?