Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15245

getxattr can lead to MDS thread exhaustion and deadlock

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • None
    • None
    • 9223372036854775807

    Description

      When selinux is enabled, a getxattr becomes part of lookup.

      So, this sequence occurs:

      --------
      Client A performs lookup on resource X, gets a lock - call it lock A - on resource X.
      Client A starts getxattr request on resource X, getxattr takes a reference on lock A.  Lock A now can't be cancelled until the getxattr is complete.
      [Pause client A here]

      Client B attempts a modifying operation on resource X, so it requests a conflicting lock on resource X.
      Call it lock B.
      Lock B is now waiting behind lock A.  The waiting is done by an MDT worker thread, so this thread is now 'consumed' - it's busy and can't handle any other requests.

      Now we have client C...  Client D... client E, F, G, H ... etc.
      And they also want locks on resource X.
      All of these locks (Lock C, lock D, etc...) wait behind Lock B.  (Some of them may be modifying locks, some may not be, it doesn't matter.)
      Each of these waiting locks 'consumes' an MDT worker thread, because the thread has to do the waiting.

      You can easily see how any number of threads can be consumed like this.

      So now, go back to client A:
      Getxattr request arrives at MDT
      No threads are available to service the getxattr request, so it waits 

      And now client A is holding lock A, but it cannot complete the operation.  So, eventually client A is evicted because it can't give the lock back.


      The solution is to move getaxttr - and getattr for consistency, as there may be a possible bug there as well - to the MDS_READPAGE portal.

      Attachments

        Activity

          People

            paf0186 Patrick Farrell (Inactive)
            paf0186 Patrick Farrell (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: