Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4581

ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed:

Details

    • Bug
    • Resolution: Fixed
    • Major
    • None
    • Lustre 2.4.1
    • None
    • client: slessp2 2.4.1-3nas
    • 3
    • 12522

    Description

      3891131.052019] LustreError: 27617:0:(cl_lock.c:1964:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed: ^M
      [3891131.067131] LustreError: 27617:0:(cl_lock.c:1964:discard_cb()) LBUG^M
      [3891131.073568] Pid: 27617, comm: geos.xco2.osse.^M
      
      
      PID: 27617  TASK: ffff8809be7a04c0  CPU: 21  COMMAND: "geos.xco2.osse."
       #0 [ffff88103321f620] panic at ffffffff81450800
       #1 [ffff88103321f6a0] lbug_with_loc at ffffffffa04c2dc3 [libcfs]
       #2 [ffff88103321f6c0] discard_cb at ffffffffa0633774 [obdclass]
       #3 [ffff88103321f6f0] cl_page_gang_lookup at ffffffffa06309d6 [obdclass]
       #4 [ffff88103321f780] cl_lock_discard_pages at ffffffffa063346a [obdclass]
       #5 [ffff88103321f7c0] osc_lock_flush at ffffffffa08fe272 [osc]
       #6 [ffff88103321f810] osc_lock_cancel at ffffffffa08fe4d9 [osc]
       #7 [ffff88103321f850] cl_lock_cancel0 at ffffffffa0631155 [obdclass]
       #8 [ffff88103321f870] cl_lock_cancel at ffffffffa0631eab [obdclass]
       #9 [ffff88103321f890] osc_lock_blocking at ffffffffa08ff01d [osc]
      #10 [ffff88103321f8c0] osc_dlm_blocking_ast0 at ffffffffa08ff8c9 [osc]
      #11 [ffff88103321f900] osc_ldlm_blocking_ast at ffffffffa08ffa2c [osc]
      #12 [ffff88103321f940] ldlm_cancel_callback at ffffffffa0742e0f [ptlrpc]
      #13 [ffff88103321f950] ldlm_cli_cancel_local at ffffffffa075121f [ptlrpc]
      #14 [ffff88103321f970] ldlm_cli_cancel_list_local at ffffffffa07545b2 [ptlrpc]
      #15 [ffff88103321f9d0] ldlm_prep_elc_req at ffffffffa07563bf [ptlrpc]
      #16 [ffff88103321fa40] ldlm_prep_enqueue_req at ffffffffa075648f [ptlrpc]
      #17 [ffff88103321fa50] osc_enqueue_base at ffffffffa08e525f [osc]
      #18 [ffff88103321faf0] osc_lock_enqueue at ffffffffa08ff440 [osc]
      #19 [ffff88103321fb60] cl_enqueue_kick at ffffffffa0632652 [obdclass]
      #20 [ffff88103321fb90] cl_enqueue_try at ffffffffa0635961 [obdclass]
      #21 [ffff88103321fbc0] lov_lock_enqueue_one at ffffffffa0995d8d [lov]
      #22 [ffff88103321fbf0] lov_lock_enqueue at ffffffffa09984bb [lov]
      #23 [ffff88103321fc60] cl_enqueue_kick at ffffffffa0632652 [obdclass]
      #24 [ffff88103321fc90] cl_enqueue_try at ffffffffa0635961 [obdclass]
      #25 [ffff88103321fcc0] cl_enqueue_locked at ffffffffa0636717 [obdclass]
      #26 [ffff88103321fcf0] cl_lock_request at ffffffffa06373e9 [obdclass]
      #27 [ffff88103321fd40] cl_glimpse_lock at ffffffffa0a6ae5d [lustre]
      #28 [ffff88103321fda0] cl_glimpse_size0 at ffffffffa0a6b31f [lustre]
      #29 [ffff88103321fdf0] ll_glimpse_size at ffffffffa0a19695 [lustre]
      #30 [ffff88103321fe10] ll_inode_revalidate_it at ffffffffa0a1eb68 [lustre]
      #31 [ffff88103321fe40] ll_getattr_it at ffffffffa0a1ebae [lustre]
      #32 [ffff88103321fe70] ll_getattr at ffffffffa0a1ecff [lustre]
      #33 [ffff88103321fed0] vfs_fstat at ffffffff81155d27
      #34 [ffff88103321fef0] sys_newfstat at ffffffff81155d6f
      #35 [ffff88103321ff80] system_call_fastpath at ffffffff8145b412
          RIP: 00002aaaacfe9394  RSP: 00007fffffff5a20  RFLAGS: 00000202
          RAX: 0000000000000005  RBX: ffffffff8145b412  RCX: 0000000000000001
          RDX: 00007fffffdafbf8  RSI: 00007fffffdafbf8  RDI: 0000000000000009
          RBP: 00007fffffdb1cc0   R8: 0000000000000000   R9: 000000000000000a
          R10: 00007fffffdac960  R11: 0000000000000246  R12: 0000000000000001
          R13: 00007fffffdb1dc0  R14: 0000000000000000  R15: 00007fffffdb1cc0
          ORIG_RAX: 0000000000000005  CS: 0033  SS: 002b
      
      

      Attachments

        Issue Links

          Activity

            [LU-4581] ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed:

            Looks like the patch we supplied has fixed this problem, as far as we can tell.
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Looks like the patch we supplied has fixed this problem, as far as we can tell. ~ jfc.

            Hello Mahmoud,
            I see from the comments above that that patch we supplied has been merged.
            Do you want us to keep this ticket open?
            Thanks,
            ~ jfc.

            jfc John Fuchs-Chesney (Inactive) added a comment - Hello Mahmoud, I see from the comments above that that patch we supplied has been merged. Do you want us to keep this ticket open? Thanks, ~ jfc.

            Mahmoud, I think at the moment we are okay.

            cliffw Cliff White (Inactive) added a comment - Mahmoud, I think at the moment we are okay.

            With the patch we have not hit this lbug. But we only hit this a few times, even without the patch. I can send you the dump for the initial crash if you still need it.

            mhanafi Mahmoud Hanafi added a comment - With the patch we have not hit this lbug. But we only hit this a few times, even without the patch. I can send you the dump for the initial crash if you still need it.

            Mahmoud,
            As stated above, you can send crash dumps to me, cliff.white@intel.com - I am a US citizen, and handled the last dump analysis. Please advise be email when you do this. We would very much like to get some further debugging information, logs from the clients with D_CACHE enabled would be good, a crash dump with the 5419 patch would also be useful.

            cliffw Cliff White (Inactive) added a comment - Mahmoud, As stated above, you can send crash dumps to me, cliff.white@intel.com - I am a US citizen, and handled the last dump analysis. Please advise be email when you do this. We would very much like to get some further debugging information, logs from the clients with D_CACHE enabled would be good, a crash dump with the 5419 patch would also be useful.

            Hi Oz,

            Is it possible to enable D_CACHE on the client side and collect some logs?

            Jinshan

            jay Jinshan Xiong (Inactive) added a comment - Hi Oz, Is it possible to enable D_CACHE on the client side and collect some logs? Jinshan

            Hi, this isn't clear to me.

            The patch from LU-2779 (5419) has already been applied.

            This is the issue, the problem reoccurs even after the patch has been applied.

            This issue was initially reported in (LU-4555), but LU-4555 was closed as a duplicate of this ticket.

            orentas Oz Rentas (Inactive) added a comment - Hi, this isn't clear to me. The patch from LU-2779 (5419) has already been applied. This is the issue, the problem reoccurs even after the patch has been applied. This issue was initially reported in ( LU-4555 ), but LU-4555 was closed as a duplicate of this ticket.

            In that case, please apply the patch 5419 and probably we can get a new stack trace, then we can work from there.

            jay Jinshan Xiong (Inactive) added a comment - In that case, please apply the patch 5419 and probably we can get a new stack trace, then we can work from there.

            From the customer -
            "Unfortunately, there was no crash dump associated with this crash.
            All I can find on the client is the error messages from the console output (sent previously):

            LustreError: 17450:0:(cl_lock.c:1964:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed:
            LustreError: 17450:0:(cl_lock.c:1964:discard_cb()) LBUG Kernel panic - not syncing: LBUG
            Pid: 17450, comm: tar Tainted: GF

            Please confirm direction in the absence of a crash."

            orentas Oz Rentas (Inactive) added a comment - From the customer - "Unfortunately, there was no crash dump associated with this crash. All I can find on the client is the error messages from the console output (sent previously): LustreError: 17450:0:(cl_lock.c:1964:discard_cb()) ASSERTION( (!(page->cp_type == CPT_CACHEABLE) || (!PageWriteback(cl_page_vmpage(env, page)))) ) failed: LustreError: 17450:0:(cl_lock.c:1964:discard_cb()) LBUG Kernel panic - not syncing: LBUG Pid: 17450, comm: tar Tainted: GF Please confirm direction in the absence of a crash."
            cliffw Cliff White (Inactive) added a comment - - edited

            You can provide it to me,(cliff.white@intel.com) as you did with the previous dump. Can you get a dump with the LU-2779 patch applied? We believe that would give us further data to diagnose the issue.

            cliffw Cliff White (Inactive) added a comment - - edited You can provide it to me,(cliff.white@intel.com) as you did with the previous dump. Can you get a dump with the LU-2779 patch applied? We believe that would give us further data to diagnose the issue.

            People

              jay Jinshan Xiong (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: