Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12522

Deadlock: ptlrpcd daemon blocked in osc_extent_wait

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.7.0
    • None
    • 3
    • 9223372036854775807

    Description

      Application is defined as hung because its tasks do not exit 35 minutes after job manager marks application as completed.

      Server:
      Feb 17 04:13:43 snx11242n004 kernel: LustreError: 0:0:(ldlm_lockd.c:343:waiting_locks_callback()) ### lock callback timer expired after 260s: evicting client at 748@gni  ns: filter-snx11242-OST0000_UUID lock: ffff880bf596fb00/0xf1d1c2b2077a5cbd lrc: 3/0,0 mode: PW/PW res: [0xf44037d:0x0:0x0].0 rrc: 3 type: EXT [0->18446744073709551615] (req 0->4194303) flags: 0x60000000020020 nid: 748@gni remote: 0xfeb3b0aabd5c0e98 cl: 4a576988-5c50-316d-4ee1-51202af4957d expref: 155 pid: 30836 timeout: 12436906645 lvb_type: 0
      Feb 17 04:13:43 snx11242n004 kernel: LustreError: 11235:0:(ldlm_extent.c:733:ldlm_resource_prolong()) Failed to get resource for resid 256141005/0
      Feb 17 04:13:43 snx11242n004 kernel: Lustre: snx11242-OST0000: Connection restored to 5ba334ef-089c-ccf0-265f-b8328dafb033 (at 748@gni)
      Feb 17 04:13:43 snx11242n004 kernel: Lustre: Skipped 785 previous similar messages
      Feb 17 04:13:44 snx11242n004 kernel: LustreError: 11330:0:(ldlm_extent.c:733:ldlm_resource_prolong()) Failed to get resource for resid 256136005/0
      Feb 17 04:13:44 snx11242n004 kernel: LustreError: 11330:0:(ldlm_extent.c:733:ldlm_resource_prolong()) Skipped 7 previous similar messages
      Feb 17 04:13:48 snx11242n004 kernel: LustreError: 11298:0:(ldlm_extent.c:733:ldlm_resource_prolong()) Failed to get resource for resid 256122474/0
      Feb 17 04:25:00 snx11242n004 kernel: Lustre: snx11242-OST0000: haven't heard from client 4a576988-5c50-316d-4ee1-51202af4957d (at 748@gni) in 677 seconds. I think it's dead, and I am evicting it. exp ffff88029ca05c00, cur 1518863100 expire 1518862650 last 1518862423 
      
      obd_device          obd_name                   ip_address             client_obd          obd_import          imp_state    ish_time   index   conn_cnt
      =========================================================================================================================
      0xffff880f8e498188  snx11242-OST0004          LNET_NID_ANY               0xffff880f8e498668  0xffff880f814c3000  EVICTED      1518864015 4     2
      0xffff880f8e8801c8  snx11242-OST0000          LNET_NID_ANY               0xffff880f8e8806a8  0xffff880f823fb800  EVICTED      1518862423 4     2
      
      Sent RPCS: ptlrpc_request_set.set_requests->rq_set_chain
      
      thread         pid    ptlrpc_request      xid                nid                 opc  phase:flags    R:W  sent/deadline          ptlrpc_body
      
      ==================================================================================================================================================
      
      ptlrpcd_01_24: 5946   0xffff881f98f96640  1592609764500656   snx11242-OST0004    4    RPC:RE         0:1  1518862418/1518862513  0xffff881f9f4bd638
      
      ptlrpcd_01_24: 5946   0xffff881f9cbd0380  1592609764494945   snx11242-OST0000    4    NtrPrt:Ee      0:1  1518862417/1518862512  0xffff881f9f283838
      
      ==================================================================================================================================================
      
      crash_x86_64> bt 5946
      PID: 5946   TASK: ffff880f8a888100  CPU: 35  COMMAND: "ptlrpcd_01_24"
       #0 [ffff880f8e0734e8] schedule at ffffffff8151c7d7
       #1 [ffff880f8e073540] osc_extent_wait at ffffffffa0a1b6a0 [osc]
       #2 [ffff880f8e073660] osc_cache_wait_range at ffffffffa0a21f4c [osc]
       #3 [ffff880f8e073750] osc_io_fsync_end at ffffffffa0a0e7c6 [osc]
       #4 [ffff880f8e073788] cl_io_end at ffffffffa05c4b65 [obdclass]
       #5 [ffff880f8e0737b0] lov_io_end_wrapper at ffffffffa0837314 [lov]
       #6 [ffff880f8e0737d0] lov_io_fsync_end at ffffffffa083766e [lov]
       #7 [ffff880f8e073800] cl_io_end at ffffffffa05c4b65 [obdclass]
       #8 [ffff880f8e073828] cl_io_loop at ffffffffa05c817c [obdclass]
       #9 [ffff880f8e073858] cl_sync_file_range at ffffffffa08ecb75 [lustre]
      #10 [ffff880f8e0738b0] ll_delete_inode at ffffffffa09006af [lustre]
      #11 [ffff880f8e0738d8] evict at ffffffff811d0807
      #12 [ffff880f8e073900] iput at ffffffff811d1056
      #13 [ffff880f8e073930] __dentry_kill at ffffffff811cc605
      #14 [ffff880f8e073958] dput at ffffffff811cc833
      #15 [ffff880f8e073980] ll_dirty_page_discard_warn at ffffffffa090f033 [lustre]
      #16 [ffff880f8e0739f8] vvp_page_completion_write at ffffffffa0940f49 [lustre]
      #17 [ffff880f8e073a30] cl_page_completion at ffffffffa05c0a33 [obdclass]
      #18 [ffff880f8e073a78] osc_ap_completion at ffffffffa0a136ec [osc]
      #19 [ffff880f8e073ac0] osc_extent_finish at ffffffffa0a18db1 [osc]
      #20 [ffff880f8e073bc0] brw_interpret at ffffffffa09ff29b [osc]
      #21 [ffff880f8e073cb8] ptlrpc_check_set at ffffffffa0700233 [ptlrpc]
      #22 [ffff880f8e073d80] ptlrpcd_check at ffffffffa072cf1a [ptlrpc]
      #23 [ffff880f8e073dc8] ptlrpcd at ffffffffa072d3a5 [ptlrpc
      #24 [ffff880f8e073ec8] kthread at ffffffff81078646
      #25 [ffff880f8e073f50] ret_from_fork at ffffffff81520e8f
      

      What's going on:

      The client is evicted by OST0006 while it has an outstanding bulk write rpc in progress. When the ptlrpcd daemon handles the bulk write reply, it sees the EIO (-5) error and issues the warning that dirty pages are being discarded (ll_dirty_page_discard_warn()). Lustre attempts to include the file name in the warning, but at the point the warning is issued, the ptlrpcd thread only has access to the page and its inode. It must therefore do a path lookup, getting references on the dentry and inode in the process. While it holds these references, all other references are dropped, so when the ptlrpcd thread releases its references it is responsible for deleting the inode and flushing the dirty pages associated with the inode. But the dirty pages are already being written by the failed bulk rpc, so the ptlrpcd daemon deadlocks itself.

      The cleanest fix is to remove the file name from the warning message. The message contains the fid so an administrator can still identify the file. Furthermore, in most real life cases of this warning (even those that deadlock), the file name is never successfully retrieved.

      Attachments

        Issue Links

          Activity

            [LU-12522] Deadlock: ptlrpcd daemon blocked in osc_extent_wait

            See LU-12915 for a different approach to reporting the names of files that have had dirty pages discarded.

            amk Ann Koehler (Inactive) added a comment - See LU-12915 for a different approach to reporting the names of files that have had dirty pages discarded.

            I looked through a bunch of older logs that included ll_dirty_page_discard_warn(), and while there are several that have valid filenames, those are all with Lustre 2.5.x. All of the more recent messages show "[FID]:/" as the pathname, so I agree it isn't very useful as it stands.

            Being able to run "lfs fid2path" on the FID afterward is OK for people who know Lustre, but it doesn't help for many users, and will fail for files that no longer exist. In the latter case (file was already deleted), it isn't clear if there is much value to investigate some issue if the file no longer exists.

            adilger Andreas Dilger added a comment - I looked through a bunch of older logs that included ll_dirty_page_discard_warn() , and while there are several that have valid filenames, those are all with Lustre 2.5.x. All of the more recent messages show " [FID] :/" as the pathname, so I agree it isn't very useful as it stands. Being able to run " lfs fid2path " on the FID afterward is OK for people who know Lustre, but it doesn't help for many users, and will fail for files that no longer exist. In the latter case (file was already deleted), it isn't clear if there is much value to investigate some issue if the file no longer exists.

            Another approach would be to acquire the file name further up the stack and stash it a way for use in this message. Just seemed like overkill for info that doesn't seem to get set in most cases.

            Definitely it doesn't make sense to add work to the common case that is only useful for a very uncommon case.

            adilger Andreas Dilger added a comment - Another approach would be to acquire the file name further up the stack and stash it a way for use in this message. Just seemed like overkill for info that doesn't seem to get set in most cases. Definitely it doesn't make sense to add work to the common case that is only useful for a very uncommon case.

            Do you know what the log looks like in non-deadlock cases? I think the message probably works in those cases...?

            Not that I've seen. The warnings occur following evictions. I've been watching those and haven't seen file names even without the deadlock. Again not scientific, but informs my opinion.

            Given the pushback on removing the file name, I'm inclined to withdraw my patch and leave the deadlock as an open issue in the back log.

            amk Ann Koehler (Inactive) added a comment - Do you know what the log looks like in non-deadlock cases? I think the message probably works in those cases...? Not that I've seen. The warnings occur following evictions. I've been watching those and haven't seen file names even without the deadlock. Again not scientific, but informs my opinion. Given the pushback on removing the file name, I'm inclined to withdraw my patch and leave the deadlock as an open issue in the back log.

            amk, this further discussion is bringing back some temporarily buried memories here.  In the case of the deadlock, there is no file name because the deadlock is with the file being deleted.  Do you know what the log looks like in non-deadlock cases?  I think the message probably works in those cases...?

            pfarrell Patrick Farrell (Inactive) added a comment - amk , this further discussion is bringing back some temporarily buried memories here.  In the case of the deadlock, there is no file name because the deadlock is with the file being deleted.  Do you know what the log looks like in non-deadlock cases?  I think the message probably works in those cases...?
            amk Ann Koehler (Inactive) added a comment - - edited

            Furthermore, in most real life cases of this warning (even those that deadlock), the file name is never successfully retrieved.

            could you please explain that a bit further? Is that because nobody cares to read the console error messages, or because the node is rebooted shortly after being evicted, or something else?

            In hundreds of actual warning messages that I've seen in console logs, I can't remember seeing one in which the filename is not empty. I've been paying careful attention for the last year or so and in that time, I have not seen a single one of these warnings that had a nonempty filename. Not a scientific study by any means, but I do see a lot of log files. When this bug was first identified, about 1.5 years ago, I thought that the deadlock was rare enough and filename useful enough that we should just live with the bug. But since then, we've seen half a dozen or so occurrences of the deadlock and I've noted that filename is often/always null.

            And yes, there's likely a way to preserve the file name in the warning. Another approach would be to acquire the file name further up the stack and stash it a way for use in this message. Just seemed like overkill for info that doesn't seem to get set in most cases.

            amk Ann Koehler (Inactive) added a comment - - edited Furthermore, in most real life cases of this warning (even those that deadlock), the file name is never successfully retrieved. could you please explain that a bit further? Is that because nobody cares to read the console error messages, or because the node is rebooted shortly after being evicted, or something else? In hundreds of actual warning messages that I've seen in console logs, I can't remember seeing one in which the filename is not empty. I've been paying careful attention for the last year or so and in that time, I have not seen a single one of these warnings that had a nonempty filename. Not a scientific study by any means, but I do see a lot of log files. When this bug was first identified, about 1.5 years ago, I thought that the deadlock was rare enough and filename useful enough that we should just live with the bug. But since then, we've seen half a dozen or so occurrences of the deadlock and I've noted that filename is often/always null. And yes, there's likely a way to preserve the file name in the warning. Another approach would be to acquire the file name further up the stack and stash it a way for use in this message. Just seemed like overkill for info that doesn't seem to get set in most cases.
            adilger Andreas Dilger added a comment - - edited

            We were asked by a customer to add the filename to these messages in LU-921, to make it easier for the admin to determine which files are affected when a client is evicted. Is there some other way that we could detect the circular dependency (e.g. mark the thread or inode as already being processed by ll_dirty_page_discard_warn() and avoid the freeing, or get an extra dentry or inode reference in osc_build_rpc() so that the last reference is always put in brw_interpret() after ll_dirty_page_discard_warn() is finished)?

            Furthermore, in most real life cases of this warning (even those that deadlock), the file name is never successfully retrieved.

            could you please explain that a bit further? Is that because nobody cares to read the console error messages, or because the node is rebooted shortly after being evicted, or something else?

            adilger Andreas Dilger added a comment - - edited We were asked by a customer to add the filename to these messages in LU-921 , to make it easier for the admin to determine which files are affected when a client is evicted. Is there some other way that we could detect the circular dependency (e.g. mark the thread or inode as already being processed by ll_dirty_page_discard_warn() and avoid the freeing, or get an extra dentry or inode reference in osc_build_rpc() so that the last reference is always put in brw_interpret() after ll_dirty_page_discard_warn() is finished)? Furthermore, in most real life cases of this warning (even those that deadlock), the file name is never successfully retrieved. could you please explain that a bit further? Is that because nobody cares to read the console error messages, or because the node is rebooted shortly after being evicted, or something else?

            Ann Koehler (amk@cray.com) uploaded a new patch: https://review.whamcloud.com/35443
            Subject: LU-12522 llite: Remove fname from dirty page discard warning
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: bcb448be2b42e94db4930f9614b532d5bb564fa2

            gerrit Gerrit Updater added a comment - Ann Koehler (amk@cray.com) uploaded a new patch: https://review.whamcloud.com/35443 Subject: LU-12522 llite: Remove fname from dirty page discard warning Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: bcb448be2b42e94db4930f9614b532d5bb564fa2

            People

              amk Ann Koehler (Inactive)
              amk Ann Koehler (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: