Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6983

LBUG on osc_extent_find() ASSERTION( (max_end - cur->oe_start) < max_pages ) failed: [35840 -> 511/511]

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • RHEL7 lustre client with 2.5.3 lustre server
    • 3
    • 9223372036854775807

    Description

      LBUG on osc_extent_find() ASSERTION( (max_end - cur->oe_start) < max_pages ) failed: [35840 -> 511/511]

      As the LU-6271 after some OST eviction and reconnection during an eavy I/O load
      the client do an LBUG like this :

      [794894.288763] Lustre: store0-OST0045-osc-ffff88201fcb5800: Connection restored to store0-OST0045 (at QQ.P.BBO.FB@o2ib2)
      [794896.511870] Lustre: store0-OST01f3-osc-ffff88201fcb5800: Connection restored to store0-OST01f3 (at QQ.P.BBO.II@o2ib2)
      ...
      [794898.170269] LustreError: 40201:0:(osc_cache.c:662:osc_extent_find()) ASSERTION( (max_end - cur->oe_start) < max_pages ) failed: [35840 -> 511/511]
      [794898.170280] LustreError: 40201:0:(osc_cache.c:662:osc_extent_find()) LBUG
      [794898.170287] Pid: 40201, comm: testsApiC++-gcc
      [794898.170287]
      

      and the stack of the Lbug thread was

      crash>  bt
      PID: 40201  TASK: ffff880e6f474440  CPU: 6   COMMAND: "testsApiC++-gcc"
       #0 [ffff880eeff93638] machine_kexec at ffffffff8104c4cb
       #1 [ffff880eeff93698] crash_kexec at ffffffff810e1fe2
       #2 [ffff880eeff93768] panic at ffffffff815fd7e1
       #3 [ffff880eeff937e8] lbug_with_loc at ffffffffa0473e5b [libcfs]
       #4 [ffff880eeff93808] osc_extent_find at ffffffffa0becdf2 [osc]
       #5 [ffff880eeff93990] osc_queue_async_io at ffffffffa0be4bf0 [osc]
       #6 [ffff880eeff93ad8] osc_page_cache_add at ffffffffa0bd2463 [osc]
       #7 [ffff880eeff93b00] osc_io_commit_async at ffffffffa0bd9162 [osc]
       #8 [ffff880eeff93b60] cl_io_commit_async at ffffffffa06f4007 [obdclass]
       #9 [ffff880eeff93ba8] lov_io_commit_async at ffffffffa09ecbea [lov]
      #10 [ffff880eeff93c08] cl_io_commit_async at ffffffffa06f4007 [obdclass]
      #11 [ffff880eeff93c50] vvp_io_write_commit at ffffffffa0b0007a [lustre]
      #12 [ffff880eeff93cb0] vvp_io_write_start at ffffffffa0b00aa6 [lustre]
      #13 [ffff880eeff93d00] cl_io_start at ffffffffa06f3875 [obdclass]
      #14 [ffff880eeff93d28] cl_io_loop at ffffffffa06f6c95 [obdclass]
      #15 [ffff880eeff93d58] ll_file_io_generic at ffffffffa0a9f85c [lustre]
      #16 [ffff880eeff93e60] ll_file_aio_write at ffffffffa0aa00ce [lustre]
      #17 [ffff880eeff93ea8] ll_file_write at ffffffffa0aa02b2 [lustre]
      #18 [ffff880eeff93ef8] vfs_write at ffffffff811c65dd
      #19 [ffff880eeff93f38] sys_write at ffffffff811c7028
      #20 [ffff880eeff93f80] system_call_fastpath at ffffffff81613da9
          RIP: 00007f8d6bbc39fd  RSP: 00007fff791cd238  RFLAGS: 00010216
          RAX: 0000000000000001  RBX: ffffffff81613da9  RCX: 000000000000003f
          RDX: 0000000005c00000  RSI: 00007f8bce395038  RDI: 0000000000000020
          RBP: 00007f8bce395038   R8: 00000000003ffffe   R9: 00000000003ffff4
          R10: 00000000003ffff5  R11: 0000000000000293  R12: 0000000005c00000
          R13: 0000000005c00000  R14: 0000000006f656c0  R15: 0000000005c00000
          ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      

      in this case a lot of user thread application do the same LBUG at the same time

      Question: is the LU-6271 fix (http://review.whamcloud.com/#/c/14915/) could help for this issue ?

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            apercher Antoine Percher
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: