Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.4.3
    • None
    • RHEL6 w/ patched kernel
    • 3
    • 14915

    Description

      We had to crash/dump one of our Lustre clients because of a deadlock issue in mdc_close(). The PID 5231 was waiting for a lock that it already owned. BTW, we had a lot of process waiting for this lock.

      In the backtrace of the process, we can see two calls to mdc_close(). The second is due to the system reclaiming memory.

      crash> bt 5231
      PID: 5231   TASK: ffff881518308b00  CPU: 2   COMMAND: "code2"
       #0 [ffff88171cb43188] schedule at ffffffff81528a52
       #1 [ffff88171cb43250] __mutex_lock_slowpath at ffffffff8152a20e
       #2 [ffff88171cb432c0] mutex_lock at ffffffff8152a0ab                  <=== Requires a new lock
       #3 [ffff88171cb432e0] mdc_close at ffffffffa09176db [mdc]
       #4 [ffff88171cb43330] lmv_close at ffffffffa0b9bcb8 [lmv]
       #5 [ffff88171cb43380] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre]
       #6 [ffff88171cb43400] ll_md_real_close at ffffffffa0a81afa [lustre]
       #7 [ffff88171cb43430] ll_clear_inode at ffffffffa0a92dee [lustre]
       #8 [ffff88171cb43470] clear_inode at ffffffff811a626c
       #9 [ffff88171cb43490] dispose_list at ffffffff811a6340
      #10 [ffff88171cb434d0] shrink_icache_memory at ffffffff811a6694
      #11 [ffff88171cb43530] shrink_slab at ffffffff81138b7a
      #12 [ffff88171cb43590] zone_reclaim at ffffffff8113b77e
      #13 [ffff88171cb436b0] get_page_from_freelist at ffffffff8112d8dc
      #14 [ffff88171cb437e0] __alloc_pages_nodemask at ffffffff8112f443
      #15 [ffff88171cb43920] alloc_pages_current at ffffffff811680ca
      #16 [ffff88171cb43950] __vmalloc_area_node at ffffffff81159696
      #17 [ffff88171cb439b0] __vmalloc_node at ffffffff8115953d
      #18 [ffff88171cb43a10] vmalloc at ffffffff8115985c
      #19 [ffff88171cb43a20] cfs_alloc_large at ffffffffa03b4b1e [libcfs]
      #20 [ffff88171cb43a30] null_alloc_repbuf at ffffffffa06c4961 [ptlrpc]
      #21 [ffff88171cb43a60] sptlrpc_cli_alloc_repbuf at ffffffffa06b2355 [ptlrpc]
      #22 [ffff88171cb43a90] ptl_send_rpc at ffffffffa068432c [ptlrpc]
      #23 [ffff88171cb43b50] ptlrpc_send_new_req at ffffffffa067879b [ptlrpc]
      #24 [ffff88171cb43bc0] ptlrpc_set_wait at ffffffffa067ddb6 [ptlrpc]
      #25 [ffff88171cb43c60] ptlrpc_queue_wait at ffffffffa067e0df [ptlrpc]   <=== PID has the lock
      #26 [ffff88171cb43c80] mdc_close at ffffffffa0917714 [mdc]
      #27 [ffff88171cb43cd0] lmv_close at ffffffffa0b9bcb8 [lmv]
      #28 [ffff88171cb43d20] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre]
      #29 [ffff88171cb43da0] ll_md_real_close at ffffffffa0a81afa [lustre]
      #30 [ffff88171cb43dd0] ll_md_close at ffffffffa0a81d8a [lustre]
      #31 [ffff88171cb43e80] ll_file_release at ffffffffa0a8233b [lustre]
      #32 [ffff88171cb43ec0] __fput at ffffffff8118ad55
      #33 [ffff88171cb43f10] fput at ffffffff8118ae95
      #34 [ffff88171cb43f20] filp_close at ffffffff811861bd
      #35 [ffff88171cb43f50] sys_close at ffffffff81186295
      #36 [ffff88171cb43f80] system_call_fastpath at ffffffff8100b072
          RIP: 00002adaacdf26d0  RSP: 00007fff9665e238  RFLAGS: 00010246
          RAX: 0000000000000003  RBX: ffffffff8100b072  RCX: 0000000000002261
          RDX: 00000000044a24b0  RSI: 0000000000000001  RDI: 0000000000000005
          RBP: 0000000000000000   R8: 00002adaad0ac560   R9: 0000000000000001
          R10: 00000000000004fd  R11: 0000000000000246  R12: 00000000000004fc
          R13: 00000000ffffffff  R14: 00000000044a23d0  R15: 00000000ffffffff
          ORIG_RAX: 0000000000000003  CS: 0033  SS: 002b
      

      We have a recursive locking here, which is not permitted.

      Attachments

        Activity

          People

            bfaccini Bruno Faccini (Inactive)
            bruno.travouillon Bruno Travouillon (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: