Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19220

Hang in upcall_cache_put_entry/upcall_cache_get_entry

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Due to a problem in the task wakeup logic, the MCS spinlock queue (in cache->uc_lock) grows, eventually causing the MDS to hang.

       

      PID: 3758     TASK: ff4b68551ee58000  CPU: 9    COMMAND: "mdt04_001"
       #0 [fffffe000020ce48] crash_nmi_callback at ffffffff82a5e8d3
       #1 [fffffe000020ce50] nmi_handle at ffffffff82a2b393
       #2 [fffffe000020cea8] default_do_nmi at ffffffff833ee099
       #3 [fffffe000020cec8] do_nmi at ffffffff82a2b8ef
       #4 [fffffe000020cef0] end_repeat_nmi at ffffffff836015e8
          [exception RIP: native_queued_spin_lock_slowpath+324]
          RIP: ffffffff82b5b264  RSP: ff81546d74123ad8  RFLAGS: 00000246
          RAX: 0000000000000000  RBX: ff4b685cee36c80c  RCX: 0000000000000017
          RDX: ff4b6874f1873d40  RSI: 0000000000280000  RDI: ff4b685cee36c810
          RBP: ff4b685cee36c810   R8: 0000000000000033   R9: ff81546d74123988
          R10: 8080808080808080  R11: 0000000000000029  R12: ff4b68611aae3c48
          R13: 0000000000000006  R14: ff4b6852a713e110  R15: ff4b6852a713e110
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      --- <NMI exception stack> ---
       #5 [ff81546d74123ad8] native_queued_spin_lock_slowpath at ffffffff82b5b264
       #6 [ff81546d74123ad8] queued_write_lock_slowpath at ffffffff82b5b4e7
       #7 [ff81546d74123af0] upcall_cache_put_entry at ffffffffc10baad4 [obdclass]
       #8 [ff81546d74123b18] mdt_exit_ucred at ffffffffc1b20cd9 [mdt]
       #9 [ff81546d74123b38] mdt_reint_internal at ffffffffc1b0c4b8 [mdt]
      #10 [ff81546d74123b68] mdt_intent_open at ffffffffc1b17b08 [mdt]
      #11 [ff81546d74123ba8] mdt_intent_opc at ffffffffc1b10472 [mdt]
      #12 [ff81546d74123c00] mdt_intent_policy at ffffffffc1b15b1d [mdt]
      #13 [ff81546d74123c40] ldlm_lock_enqueue at ffffffffc13db2de [ptlrpc]
      #14 [ff81546d74123cb0] ldlm_handle_enqueue0 at ffffffffc140375a [ptlrpc]
      #15 [ff81546d74123d30] tgt_enqueue at ffffffffc1492de4 [ptlrpc]
      #16 [ff81546d74123d48] tgt_request_handle at ffffffffc149c1ac [ptlrpc]
      #17 [ff81546d74123dc8] ptlrpc_server_handle_request at ffffffffc1437d23 [ptlrpc]
      #18 [ff81546d74123e30] ptlrpc_main at ffffffffc143c60f [ptlrpc]
      #19 [ff81546d74123f10] kthread at ffffffff82b1d6a4
      #20 [ff81546d74123f50] ret_from_fork at ffffffff8360024f 
      PID: 3793     TASK: ff4b68549075c000  CPU: 13   COMMAND: "mdt_rdpg06_000"
       #0 [fffffe00002f0e48] crash_nmi_callback at ffffffff82a5e8d3
       #1 [fffffe00002f0e50] nmi_handle at ffffffff82a2b393
       #2 [fffffe00002f0ea8] default_do_nmi at ffffffff833ee099
       #3 [fffffe00002f0ec8] do_nmi at ffffffff82a2b8ef
       #4 [fffffe00002f0ef0] end_repeat_nmi at ffffffff836015e8
          [exception RIP: native_queued_spin_lock_slowpath+324]
          RIP: ffffffff82b5b264  RSP: ff81546d7423bbb8  RFLAGS: 00000246
          RAX: 0000000000000000  RBX: ff4b685cee36c80c  RCX: 0000000000000009
          RDX: ff4b6874f1973d40  RSI: 0000000000380000  RDI: ff4b685cee36c810
          RBP: ff4b685cee36c810   R8: 0000000000000000   R9: 0000000000000000
          R10: 0000000000000001  R11: 00000000000000a3  R12: ff4b685c2a0343e0
          R13: ff4b685bae02f000  R14: ff4b685543406000  R15: 000000000000373e
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      --- <NMI exception stack> ---
       #5 [ff81546d7423bbb8] native_queued_spin_lock_slowpath at ffffffff82b5b264
       #6 [ff81546d7423bbb8] queued_read_lock_slowpath at ffffffff82b5b464
       #7 [ff81546d7423bbd0] upcall_cache_get_entry at ffffffffc10bc1a0 [obdclass]
       #8 [ff81546d7423bc88] mdt_identity_get at ffffffffc1b4760b [mdt]
       #9 [ff81546d7423bca0] old_init_ucred_common at ffffffffc1b1f96e [mdt]
      #10 [ff81546d7423bcd8] mdt_init_ucred_reint at ffffffffc1b21885 [mdt]
      #11 [ff81546d7423bd00] mdt_close at ffffffffc1b462cd [mdt]
      #12 [ff81546d7423bd48] tgt_request_handle at ffffffffc149c1ac [ptlrpc]
      #13 [ff81546d7423bdc8] ptlrpc_server_handle_request at ffffffffc1437d23 [ptlrpc]
      #14 [ff81546d7423be30] ptlrpc_main at ffffffffc143c60f [ptlrpc]
      #15 [ff81546d7423bf10] kthread at ffffffff82b1d6a4
      #16 [ff81546d7423bf50] ret_from_fork at ffffffff8360024f 

       

      Attachments

        Activity

          People

            skoyama Sohei Koyama
            skoyama Sohei Koyama
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: