Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
Lustre 2.16.0, Lustre 2.12.9, Lustre 2.15.2
-
None
-
9223372036854775807
Description
It would be better to change the upcall cache uc_lock to a read-write lock so that threads can get the read lock to do concurrent lookups in the upcall cache, and only grab the write lock in the rare case when a new entry is added or old entries are expired. That reduces serialization between MDS threads during normal operation, and avoids all of the threads spinning for some time if the requested key (UID) is not in the cache at all, before they sleep on uc_wait.
find_again: if (new) write_lock(&cache->uc_lock); else read_lock(&cache->uc_lock);
Because check_link_entry() is modifying the list, it cannot be done while holding the read lock. It might be done in a separate list walk after the upcall is launched before waiting for the cache to be updated. That is dead time anyway. However, some care must be taken that the expired list entries are processed properly. It may be more clear code wise to return from check_list_entry() if there are expired entries and the read lock is held, drop the read lock and get the write lock, and then retry the lookup. That would add some contention if when are expired entries, but for the majority of operations a read lock would be enough.
The CERROR() call in upcall_cache_get_entry() should be changed to the standard format, with device name (uc_name) at the start of the line and ": rc = %d\n" at the end, along with the uc_acquire_expire interval that was waited.