Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0, Lustre 2.10.4
    • None
    • RHEL7 Server
      Lustre 2.7.x series
    • 3
    • 9223372036854775807

    Description

      On one Customer site, we hit following deadlock:
      
          Thread 1:
      
          ofd_object_punch
      
           osd_punch
      
            ldiskfs_truncate
      
             ldiskfs_inode_attach_jinode
      
               ...
      
               do_try_to_free_pages
      
                lu_cache_shrink
      
                 mutex_lock -->try to hold @lu_sites_guard
      
          
      
          kswapd thread2:
      
          kthread
      
           shrink_slab
      
            lu_cache_shrink
      
              mutex_lock ---->hold already.
      
               ...
      
               dqget
      
                ldiskfs_acquire_dquot
      
                 jbd2__journal_start-->blocked to wait for more credits.
      
          
      
          Thread3:
      
          kthread
      
           kjournald2
      
            jbd2_journal_commit_transaction-->blocked to wait Thread2 finished,
      
                                       since Thread1 add a handle into transaction.
      
          
      
          So deadlock happens because of Thread1 wait Thread2, Thread2 wait Thread3..
      
          but Thread3 wait Thread1....
      
          
      
          This problem still exists even we have switched @lu_sites_guard
      
          into a read/write lock, sine we hold write lock at lu_cahce_shrink().
      
          
      
          Fixed the problem by making ldiskfs_inode_attach_jinode() use
      
          GPF_NOFS.
      
       
      
      

      Attachments

        Activity

          [LU-10859] Deadlock with heavy memory pressure
          hornc Chris Horn added a comment - - edited

          Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/31825
          Subject: LU-10859 ldiskfs: extend previous fix to SLES
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: c4956f2dff93c428b3040c1e03d08c14ec6232c8


           

          This patch was abandoned but its changes were never rolled into the primary patch for this ticket ( https://review.whamcloud.com/31806/). Should Bob's patch be revived?

           

          Edit: Nevermind, I misread the patches

          hornc Chris Horn added a comment - - edited Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/31825 Subject: LU-10859 ldiskfs: extend previous fix to SLES Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c4956f2dff93c428b3040c1e03d08c14ec6232c8   This patch was abandoned but its changes were never rolled into the primary patch for this ticket ( https://review.whamcloud.com/31806/ ). Should Bob's patch be revived?   Edit: Nevermind, I misread the patches

          John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/32058/
          Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set:
          Commit: 0595d92ad03ab9d975d599aad204d746aff991b3

          gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/32058/ Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 0595d92ad03ab9d975d599aad204d746aff991b3

          Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32058
          Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure
          Project: fs/lustre-release
          Branch: b2_10
          Current Patch Set: 1
          Commit: c5a9c83471aa5b6e0a593b7b99760e86c8311bee

          gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32058 Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: c5a9c83471aa5b6e0a593b7b99760e86c8311bee

          FYI, we'd better include this fix into b2_10 LTS branch.

          wangshilong Wang Shilong (Inactive) added a comment - FYI, we'd better include this fix into b2_10 LTS branch.
          pjones Peter Jones added a comment -

          Landed for 2.12

          pjones Peter Jones added a comment - Landed for 2.12

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31806/
          Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 0506e1bd6a6d5fafe7fc5e558aa1b75e456c2642

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31806/ Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0506e1bd6a6d5fafe7fc5e558aa1b75e456c2642

          Hello chunteraa ,

          Thanks for reminding, I refreshed the patch to fix that.

          wangshilong Wang Shilong (Inactive) added a comment - Hello  chunteraa  , Thanks for reminding, I refreshed the patch to fix that.

          Bob, can you please add a follow-on patch for SLES, either using the same patch (if it applies cleanly) or new patches as needed, once this initial patch has passed review & testing.

          done

          bogl Bob Glossman (Inactive) added a comment - Bob, can you please add a follow-on patch for SLES, either using the same patch (if it applies cleanly) or new patches as needed, once this initial patch has passed review & testing. done

          Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/31825
          Subject: LU-10859 ldiskfs: extend previous fix to SLES
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: c4956f2dff93c428b3040c1e03d08c14ec6232c8

          gerrit Gerrit Updater added a comment - Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/31825 Subject: LU-10859 ldiskfs: extend previous fix to SLES Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c4956f2dff93c428b3040c1e03d08c14ec6232c8

          patch LU-9728 uses GFP_HIGHUSER for allocations instead of GFP_NOFS

          kernel_patch filename is "GPF_NOFS" but alloc flag is GFP_NOFS
          https://www.kernel.org/doc/gorman/html/understand/understand009.html

           

          chunteraa Chris Hunter (Inactive) added a comment - patch LU-9728 uses GFP_HIGHUSER for allocations instead of GFP_NOFS kernel_patch filename is "GPF_NOFS" but alloc flag is GFP_NOFS https://www.kernel.org/doc/gorman/html/understand/understand009.html  

          People

            wangshilong Wang Shilong (Inactive)
            wangshilong Wang Shilong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: