Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17427

reduce hold time for BFL rename lock

Details

    • 3
    • 9223372036854775807

    Description

      During a non-parallel rename, the BFL resource is locked by the MDS thread first, then (up to) 4 child FIDs are locked. This means the BFL can be held for a long time if any client holding one of those locks is non-responsive for some reason. There may potentially be hundreds of clients holding PR locks on the source or target directory and/or child being renamed.

      To reduce the hold time and contention on the BFL resource lock, the MDS could get the 4 child locks first (to cancel the majority of lock holders), drop those locks, then get the BFL resource lock and re-lock the children.

      In many cases, this would allow many or all contended DLM locks held by children to be cancelled without holding the BFL lock, which avoids holding the BFL when talking to slow clients, and also reduces the overall time that the BFL lock is held (allowing more renames to be done).

      A further optimization might be to acquire the child locks first, then "trylock" the BFL lock afterward. If the BFL locking succeeds (i.e. it is uncontendeed), then verify the parent and child objects have not been modified since they were locked, maybe also the path connectivity. That would help avoid lock ping-pong in situations where the parent/child locks continue to be contended, and the MDS would only get them once if it works.

      Attachments

        Issue Links

          Activity

            [LU-17427] reduce hold time for BFL rename lock
            pjones Peter Jones added a comment -

            Merged for 2.17

            pjones Peter Jones added a comment - Merged for 2.17

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57741/
            Subject: LU-17427 mdt: reduce hold time for BFL rename lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 8dbf0494798862b54e21fd02fb0439ce9633b4cb

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/57741/ Subject: LU-17427 mdt: reduce hold time for BFL rename lock Project: fs/lustre-release Branch: master Current Patch Set: Commit: 8dbf0494798862b54e21fd02fb0439ce9633b4cb
            gerrit Gerrit Updater added a comment - - edited

            "kg.xu <squalfof@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58027
            Subject: LU-17427 test: reduce hold time for BFL rename lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 1a2fac00a56335ed88b6365b8093184a884c0696

            gerrit Gerrit Updater added a comment - - edited "kg.xu <squalfof@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/58027 Subject: LU-17427 test: reduce hold time for BFL rename lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 1a2fac00a56335ed88b6365b8093184a884c0696

            "kg.xu <squalfof@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57741
            Subject: LU-17427 mdt: reduce hold time for BFL rename lock
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2d4781e035d1f0f3cc09b9c3b4703aceb6383b43

            gerrit Gerrit Updater added a comment - "kg.xu <squalfof@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/57741 Subject: LU-17427 mdt: reduce hold time for BFL rename lock Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2d4781e035d1f0f3cc09b9c3b4703aceb6383b43

            People

              squalfof Keguang Xu
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: