Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15127

import invalidation vs writeback deadlock

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • None
    • Upstream
    • None
    • 3
    • 9223372036854775807

    Description

      racer hits this deadlock few times a day:

      schedule,osc_extent_wait,osc_cache_wait_range,osc_cache_writeback_range,osc_io_fsync_start,cl_io_start,lov_io_call,cl_io_start,cl_io_loop,cl_sync_file_range,ll_delete_inode,evict,__dentry_kill,dentry_kill,dput,ll_dirty_page_discard_warn,vvp_page_completion_write,cl_page_completion,osc_ap_completion,osc_extent_finish,brw_interpret,ptlrpc_check_set,ptlrpcd
      	PIDs(1): "ptlrpcd_00_00":4889 
      
      schedule,osc_extent_wait,osc_cache_wait_range,osc_cache_writeback_range,osc_ldlm_blocking_ast,ldlm_cancel_callback,ldlm_cli_cancel_local,ldlm_cli_cancel,osc_ldlm_blocking_ast,ldlm_handle_bl_callback,ldlm_bl_thread_main
      	PIDs(1): "ldlm_bl_02":7759 
      
      schedule,ptlrpc_invalidate_import,ptlrpc_invalidate_import_thread
      	PIDs(1): "ll_imp_inval":293752 
      
      schedule,ptlrpc_invalidate_import,ptlrpc_set_import_active,osc_iocontrol,lov_iocontrol,ll_umount_begin,ksys_umount,__x64_sys_umount
      	PIDs(1): "umount":449648 
      

      Attachments

        Issue Links

          Activity

            [LU-15127] import invalidation vs writeback deadlock
            pjones Peter Jones added a comment -

            ok thanks - then let's close out this ticket and track the landing of the second patch under a different Jira if James finds it useful...

            pjones Peter Jones added a comment - ok thanks - then let's close out this ticket and track the landing of the second patch under a different Jira if James finds it useful...

            Hey Peter,

            Yes, sure - we could move it.  To be clear, the cleanup patch isn't just code cleanup - it's an attempt at a fix for the second deadlock Alex found.  Cleanup here is cleaning up here during import invalidation (I believe unmount and eviction).  But that's different from what this was originally opened for, and that second issue hasn't come up since Alex reported it.  Hang on unmount is obnoxious but I think quite rare, so...

            So I'm sort of ambivalent about it - It's probably worth tracking this with a new ticket, but unless it turns out to be the fix for James's problem...  Yeah.

            paf Patrick Farrell (Inactive) added a comment - - edited Hey Peter, Yes, sure - we could move it.  To be clear, the cleanup patch isn't just code cleanup - it's an attempt at a fix for the second deadlock Alex found.  Cleanup here is cleaning up here during import invalidation (I believe unmount and eviction).  But that's different from what this was originally opened for, and that second issue hasn't come up since Alex reported it.  Hang on unmount is obnoxious but I think quite rare, so... So I'm sort of ambivalent about it - It's probably worth tracking this with a new ticket, but unless it turns out to be the fix for James's problem...  Yeah.

            I actually see a bug with my rhashtable ldlm patch in this area. Wonder if it would help?

            https://testing.whamcloud.com/test_logs/6b6788aa-0a86-4a60-8786-506f2d77cf3b/show_text

            simmonsja James A Simmons added a comment - I actually see a bug with my rhashtable ldlm patch in this area. Wonder if it would help? https://testing.whamcloud.com/test_logs/6b6788aa-0a86-4a60-8786-506f2d77cf3b/show_text
            pjones Peter Jones added a comment -

            Patrick

            A real blast from the past here. It looks like only the "clean up" patch - https://review.whamcloud.com/#/c/fs/lustre-release/+/45658/ - is still being tracked under this ticket. Is that still required? If so a rebase and adding some reviewers would move things along. Otherwise I would say that we should close this as fixed in 2.16 and track and residual issues under a new ticket.

            Any objections to that approach?

            Peter

            pjones Peter Jones added a comment - Patrick A real blast from the past here. It looks like only the "clean up" patch - https://review.whamcloud.com/#/c/fs/lustre-release/+/45658/ - is still being tracked under this ticket. Is that still required? If so a rebase and adding some reviewers would move things along. Otherwise I would say that we should close this as fixed in 2.16 and track and residual issues under a new ticket. Any objections to that approach? Peter

            There are a few fairly heroic guesses in that patch, but I think it's probably right...  Alex, if you can try it in your test rig...

            paf0186 Patrick Farrell added a comment - There are a few fairly heroic guesses in that patch, but I think it's probably right...  Alex, if you can try it in your test rig...

            "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45658
            Subject: LU-15127 osc: Resource cleanup in osc invalidate
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 7bad59f258269dd358d066bd023defeaec955f6d

            gerrit Gerrit Updater added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45658 Subject: LU-15127 osc: Resource cleanup in osc invalidate Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 7bad59f258269dd358d066bd023defeaec955f6d

            People

              paf Patrick Farrell (Inactive)
              bzzz Alex Zhuravlev
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: