Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12522

Deadlock: ptlrpcd daemon blocked in osc_extent_wait

    Details

    • Type: Bug
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: Lustre 2.7.0
    • Fix Version/s: None
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      Application is defined as hung because its tasks do not exit 35 minutes after job manager marks application as completed.

      Server:
      Feb 17 04:13:43 snx11242n004 kernel: LustreError: 0:0:(ldlm_lockd.c:343:waiting_locks_callback()) ### lock callback timer expired after 260s: evicting client at 748@gni  ns: filter-snx11242-OST0000_UUID lock: ffff880bf596fb00/0xf1d1c2b2077a5cbd lrc: 3/0,0 mode: PW/PW res: [0xf44037d:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4194303) flags: 0x60000000020020 nid: 748@gni remote: 0xfeb3b0aabd5c0e98 cl: 4a576988-5c50-316d-4ee1-51202af4957d expref: 155 pid: 30836 timeout: 12436906645 lvb_type: 0
      Feb 17 04:13:43 snx11242n004 kernel: LustreError: 11235:0:(ldlm_extent.c:733:ldlm_resource_prolong()) Failed to get resource for resid 256141005/0
      Feb 17 04:13:43 snx11242n004 kernel: Lustre: snx11242-OST0000: Connection restored to 5ba334ef-089c-ccf0-265f-b8328dafb033 (at 748@gni)
      Feb 17 04:13:43 snx11242n004 kernel: Lustre: Skipped 785 previous similar messages
      Feb 17 04:13:44 snx11242n004 kernel: LustreError: 11330:0:(ldlm_extent.c:733:ldlm_resource_prolong()) Failed to get resource for resid 256136005/0
      Feb 17 04:13:44 snx11242n004 kernel: LustreError: 11330:0:(ldlm_extent.c:733:ldlm_resource_prolong()) Skipped 7 previous similar messages
      Feb 17 04:13:48 snx11242n004 kernel: LustreError: 11298:0:(ldlm_extent.c:733:ldlm_resource_prolong()) Failed to get resource for resid 256122474/0
      Feb 17 04:25:00 snx11242n004 kernel: Lustre: snx11242-OST0000: haven't heard from client 4a576988-5c50-316d-4ee1-51202af4957d (at 748@gni) in 677 seconds. I think it's dead, and I am evicting it. exp ffff88029ca05c00, cur 1518863100 expire 1518862650 last 1518862423 
      
      obd_device          obd_name                   ip_address             client_obd          obd_import          imp_state    ish_time   index   conn_cnt
      =========================================================================================================================
      0xffff880f8e498188  snx11242-OST0004          LNET_NID_ANY               0xffff880f8e498668  0xffff880f814c3000  EVICTED      1518864015 4     2
      0xffff880f8e8801c8  snx11242-OST0000          LNET_NID_ANY               0xffff880f8e8806a8  0xffff880f823fb800  EVICTED      1518862423 4     2
      
      Sent RPCS: ptlrpc_request_set.set_requests->rq_set_chain
      
      thread         pid    ptlrpc_request      xid                nid                 opc  phase:flags    R:W  sent/deadline          ptlrpc_body
      
      ==================================================================================================================================================
      
      ptlrpcd_01_24: 5946   0xffff881f98f96640  1592609764500656   snx11242-OST0004    4    RPC:RE         0:1  1518862418/1518862513  0xffff881f9f4bd638
      
      ptlrpcd_01_24: 5946   0xffff881f9cbd0380  1592609764494945   snx11242-OST0000    4    NtrPrt:Ee      0:1  1518862417/1518862512  0xffff881f9f283838
      
      ==================================================================================================================================================
      
      crash_x86_64> bt 5946
      PID: 5946   TASK: ffff880f8a888100  CPU: 35  COMMAND: "ptlrpcd_01_24"
       #0 [ffff880f8e0734e8] schedule at ffffffff8151c7d7
       #1 [ffff880f8e073540] osc_extent_wait at ffffffffa0a1b6a0 [osc]
       #2 [ffff880f8e073660] osc_cache_wait_range at ffffffffa0a21f4c [osc]
       #3 [ffff880f8e073750] osc_io_fsync_end at ffffffffa0a0e7c6 [osc]
       #4 [ffff880f8e073788] cl_io_end at ffffffffa05c4b65 [obdclass]
       #5 [ffff880f8e0737b0] lov_io_end_wrapper at ffffffffa0837314 [lov]
       #6 [ffff880f8e0737d0] lov_io_fsync_end at ffffffffa083766e [lov]
       #7 [ffff880f8e073800] cl_io_end at ffffffffa05c4b65 [obdclass]
       #8 [ffff880f8e073828] cl_io_loop at ffffffffa05c817c [obdclass]
       #9 [ffff880f8e073858] cl_sync_file_range at ffffffffa08ecb75 [lustre]
      #10 [ffff880f8e0738b0] ll_delete_inode at ffffffffa09006af [lustre]
      #11 [ffff880f8e0738d8] evict at ffffffff811d0807
      #12 [ffff880f8e073900] iput at ffffffff811d1056
      #13 [ffff880f8e073930] __dentry_kill at ffffffff811cc605
      #14 [ffff880f8e073958] dput at ffffffff811cc833
      #15 [ffff880f8e073980] ll_dirty_page_discard_warn at ffffffffa090f033 [lustre]
      #16 [ffff880f8e0739f8] vvp_page_completion_write at ffffffffa0940f49 [lustre]
      #17 [ffff880f8e073a30] cl_page_completion at ffffffffa05c0a33 [obdclass]
      #18 [ffff880f8e073a78] osc_ap_completion at ffffffffa0a136ec [osc]
      #19 [ffff880f8e073ac0] osc_extent_finish at ffffffffa0a18db1 [osc]
      #20 [ffff880f8e073bc0] brw_interpret at ffffffffa09ff29b [osc]
      #21 [ffff880f8e073cb8] ptlrpc_check_set at ffffffffa0700233 [ptlrpc]
      #22 [ffff880f8e073d80] ptlrpcd_check at ffffffffa072cf1a [ptlrpc]
      #23 [ffff880f8e073dc8] ptlrpcd at ffffffffa072d3a5 [ptlrpc
      #24 [ffff880f8e073ec8] kthread at ffffffff81078646
      #25 [ffff880f8e073f50] ret_from_fork at ffffffff81520e8f
      

      What's going on:

      The client is evicted by OST0006 while it has an outstanding bulk write rpc in progress. When the ptlrpcd daemon handles the bulk write reply, it sees the EIO (-5) error and issues the warning that dirty pages are being discarded (ll_dirty_page_discard_warn()). Lustre attempts to include the file name in the warning, but at the point the warning is issued, the ptlrpcd thread only has access to the page and its inode. It must therefore do a path lookup, getting references on the dentry and inode in the process. While it holds these references, all other references are dropped, so when the ptlrpcd thread releases its references it is responsible for deleting the inode and flushing the dirty pages associated with the inode. But the dirty pages are already being written by the failed bulk rpc, so the ptlrpcd daemon deadlocks itself.

      The cleanest fix is to remove the file name from the warning message. The message contains the fid so an administrator can still identify the file. Furthermore, in most real life cases of this warning (even those that deadlock), the file name is never successfully retrieved.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                amk Ann Koehler (Inactive)
                Reporter:
                amk Ann Koehler (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated: