Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
During a non-parallel rename, the BFL resource is locked by the MDS thread first, then (up to) 4 child FIDs are locked. This means the BFL can be held for a long time if any client holding one of those locks is non-responsive for some reason. There may potentially be hundreds of clients holding PR locks on the source or target directory and/or child being renamed.
To reduce the hold time and contention on the BFL resource lock, the MDS could get the 4 child locks first (to cancel the majority of lock holders), drop those locks, then get the BFL resource lock and re-lock the children.
In many cases, this would allow many or all contended DLM locks held by children to be cancelled without holding the BFL lock, which avoids holding the BFL when talking to slow clients, and also reduces the overall time that the BFL lock is held (allowing more renames to be done).
A further optimization might be to acquire the child locks first, then "trylock" the BFL lock afterward. If the BFL locking succeeds (i.e. it is uncontendeed), then verify the parent and child objects have not been modified since they were locked, maybe also the path connectivity. That would help avoid lock ping-pong in situations where the parent/child locks continue to be contended, and the MDS would only get them once if it works.