[LU-10859] Deadlock with heavy memory pressure - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.12.0, Lustre 2.10.4
Affects Version/s: None
Labels:
- patch
Environment:
RHEL7 Server
Lustre 2.7.x series

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

On one Customer site, we hit following deadlock:

    Thread 1:

    ofd_object_punch

     osd_punch

      ldiskfs_truncate

       ldiskfs_inode_attach_jinode

         ...

         do_try_to_free_pages

          lu_cache_shrink

           mutex_lock -->try to hold @lu_sites_guard

    

    kswapd thread2:

    kthread

     shrink_slab

      lu_cache_shrink

        mutex_lock ---->hold already.

         ...

         dqget

          ldiskfs_acquire_dquot

           jbd2__journal_start-->blocked to wait for more credits.

    

    Thread3:

    kthread

     kjournald2

      jbd2_journal_commit_transaction-->blocked to wait Thread2 finished,

                                 since Thread1 add a handle into transaction.

    

    So deadlock happens because of Thread1 wait Thread2, Thread2 wait Thread3..

    but Thread3 wait Thread1....

    

    This problem still exists even we have switched @lu_sites_guard

    into a read/write lock, sine we hold write lock at lu_cahce_shrink().

    

    Fixed the problem by making ldiskfs_inode_attach_jinode() use

    GPF_NOFS.

Attachments

Activity

[LU-10859] Deadlock with heavy memory pressure

Chris Horn added a comment - 31/May/18 3:01 PM - edited

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/31825
Subject: ~~LU-10859~~ ldiskfs: extend previous fix to SLES
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c4956f2dff93c428b3040c1e03d08c14ec6232c8

~~This patch was abandoned but its changes were never rolled into the primary patch for this ticket ( https://review.whamcloud.com/31806/). Should Bob's patch be revived?~~

Edit: Nevermind, I misread the patches

Chris Horn added a comment - 31/May/18 3:01 PM - edited Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/31825 Subject: LU-10859 ldiskfs: extend previous fix to SLES Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c4956f2dff93c428b3040c1e03d08c14ec6232c8 This patch was abandoned but its changes were never rolled into the primary patch for this ticket ( https://review.whamcloud.com/31806/ ). Should Bob's patch be revived? Edit: Nevermind, I misread the patches

Gerrit Updater added a comment - 03/May/18 8:00 PM

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/32058/
Subject: ~~LU-10859~~ ldiskfs: fix deadlock with heavy memory preassure
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 0595d92ad03ab9d975d599aad204d746aff991b3

Gerrit Updater added a comment - 03/May/18 8:00 PM John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/32058/ Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 0595d92ad03ab9d975d599aad204d746aff991b3

Gerrit Updater added a comment - 18/Apr/18 8:48 PM

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32058
Subject: ~~LU-10859~~ ldiskfs: fix deadlock with heavy memory preassure
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: c5a9c83471aa5b6e0a593b7b99760e86c8311bee

Gerrit Updater added a comment - 18/Apr/18 8:48 PM Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/32058 Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: c5a9c83471aa5b6e0a593b7b99760e86c8311bee

Wang Shilong (Inactive) added a comment - 17/Apr/18 12:45 AM

FYI, we'd better include this fix into b2_10 LTS branch.

Wang Shilong (Inactive) added a comment - 17/Apr/18 12:45 AM FYI, we'd better include this fix into b2_10 LTS branch.

Peter Jones added a comment - 09/Apr/18 9:04 PM

Landed for 2.12

Peter Jones added a comment - 09/Apr/18 9:04 PM Landed for 2.12

Gerrit Updater added a comment - 09/Apr/18 7:51 PM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31806/
Subject: ~~LU-10859~~ ldiskfs: fix deadlock with heavy memory preassure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0506e1bd6a6d5fafe7fc5e558aa1b75e456c2642

Gerrit Updater added a comment - 09/Apr/18 7:51 PM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31806/ Subject: LU-10859 ldiskfs: fix deadlock with heavy memory preassure Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0506e1bd6a6d5fafe7fc5e558aa1b75e456c2642

Wang Shilong (Inactive) added a comment - 30/Mar/18 1:40 AM

Hello chunteraa ,

Thanks for reminding, I refreshed the patch to fix that.

Wang Shilong (Inactive) added a comment - 30/Mar/18 1:40 AM Hello chunteraa , Thanks for reminding, I refreshed the patch to fix that.

Bob Glossman (Inactive) added a comment - 29/Mar/18 5:16 PM

Bob, can you please add a follow-on patch for SLES, either using the same patch (if it applies cleanly) or new patches as needed, once this initial patch has passed review & testing.

done

Bob Glossman (Inactive) added a comment - 29/Mar/18 5:16 PM Bob, can you please add a follow-on patch for SLES, either using the same patch (if it applies cleanly) or new patches as needed, once this initial patch has passed review & testing. done

Gerrit Updater added a comment - 29/Mar/18 5:11 PM

Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/31825
Subject: ~~LU-10859~~ ldiskfs: extend previous fix to SLES
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c4956f2dff93c428b3040c1e03d08c14ec6232c8

Gerrit Updater added a comment - 29/Mar/18 5:11 PM Bob Glossman (bob.glossman@intel.com) uploaded a new patch: https://review.whamcloud.com/31825 Subject: LU-10859 ldiskfs: extend previous fix to SLES Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c4956f2dff93c428b3040c1e03d08c14ec6232c8

Chris Hunter (Inactive) added a comment - 29/Mar/18 4:19 PM

patch ~~LU-9728~~ uses GFP_HIGHUSER for allocations instead of GFP_NOFS

kernel_patch filename is "GPF_NOFS" but alloc flag is GFP_NOFS
https://www.kernel.org/doc/gorman/html/understand/understand009.html

Chris Hunter (Inactive) added a comment - 29/Mar/18 4:19 PM patch LU-9728 uses GFP_HIGHUSER for allocations instead of GFP_NOFS kernel_patch filename is "GPF_NOFS" but alloc flag is GFP_NOFS https://www.kernel.org/doc/gorman/html/understand/understand009.html

People

Assignee:: Wang Shilong (Inactive)

Reporter:: Wang Shilong (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 28/Mar/18 1:42 AM

Updated:: 11/May/20 5:56 PM

Resolved:: 09/Apr/18 9:04 PM