Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15245

getxattr can lead to MDS thread exhaustion and deadlock

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • None
    • 9223372036854775807

    Description

      When selinux is enabled, a getxattr becomes part of lookup.

      So, this sequence occurs:

      --------
      Client A performs lookup on resource X, gets a lock - call it lock A - on resource X.
      Client A starts getxattr request on resource X, getxattr takes a reference on lock A.  Lock A now can't be cancelled until the getxattr is complete.
      [Pause client A here]

      Client B attempts a modifying operation on resource X, so it requests a conflicting lock on resource X.
      Call it lock B.
      Lock B is now waiting behind lock A.  The waiting is done by an MDT worker thread, so this thread is now 'consumed' - it's busy and can't handle any other requests.

      Now we have client C...  Client D... client E, F, G, H ... etc.
      And they also want locks on resource X.
      All of these locks (Lock C, lock D, etc...) wait behind Lock B.  (Some of them may be modifying locks, some may not be, it doesn't matter.)
      Each of these waiting locks 'consumes' an MDT worker thread, because the thread has to do the waiting.

      You can easily see how any number of threads can be consumed like this.

      So now, go back to client A:
      Getxattr request arrives at MDT
      No threads are available to service the getxattr request, so it waits 

      And now client A is holding lock A, but it cannot complete the operation.  So, eventually client A is evicted because it can't give the lock back.


      The solution is to move getaxttr - and getattr for consistency, as there may be a possible bug there as well - to the MDS_READPAGE portal.

      Attachments

        Activity

          [LU-15245] getxattr can lead to MDS thread exhaustion and deadlock
          pjones Peter Jones added a comment -

          Landed for 2.15

          pjones Peter Jones added a comment - Landed for 2.15

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45593/
          Subject: LU-15245 mdc: GET(X)ATTR to READPAGE portal
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 5552eba1451d47ce1ba6c7ca112aa4b9b2f87292

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45593/ Subject: LU-15245 mdc: GET(X)ATTR to READPAGE portal Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5552eba1451d47ce1ba6c7ca112aa4b9b2f87292

          "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45593
          Subject: LU-15245 mdc: GET(X)ATTR to READPAGE portal
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: ebb035756eb059b255d4c8245d42bc5d5b96bab9

          gerrit Gerrit Updater added a comment - "Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45593 Subject: LU-15245 mdc: GET(X)ATTR to READPAGE portal Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ebb035756eb059b255d4c8245d42bc5d5b96bab9

          People

            paf0186 Patrick Farrell
            paf0186 Patrick Farrell
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: