[LU-16498] change upcall uc_lock to read-write lock Created: 20/Jan/23  Updated: 20/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0, Lustre 2.12.9, Lustre 2.15.2
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: Sebastien Buisson
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-16165 Retry mechanism for identity cache Resolved
is related to LU-17447 kernel BUG at lib/list_debug.c:31! Resolved
Rank (Obsolete): 9223372036854775807

 Description   

It would be better to change the upcall cache uc_lock to a read-write lock so that threads can get the read lock to do concurrent lookups in the upcall cache, and only grab the write lock in the rare case when a new entry is added or old entries are expired. That reduces serialization between MDS threads during normal operation, and avoids all of the threads spinning for some time if the requested key (UID) is not in the cache at all, before they sleep on uc_wait.

find_again:
        if (new)
                write_lock(&cache->uc_lock);
        else
                read_lock(&cache->uc_lock);

Because check_link_entry() is modifying the list, it cannot be done while holding the read lock. It might be done in a separate list walk after the upcall is launched before waiting for the cache to be updated. That is dead time anyway. However, some care must be taken that the expired list entries are processed properly. It may be more clear code wise to return from check_list_entry() if there are expired entries and the read lock is held, drop the read lock and get the write lock, and then retry the lookup. That would add some contention if when are expired entries, but for the majority of operations a read lock would be enough.

The CERROR() call in upcall_cache_get_entry() should be changed to the standard format, with device name (uc_name) at the start of the line and ": rc = %d\n" at the end, along with the uc_acquire_expire interval that was waited.



 Comments   
Comment by Gerrit Updater [ 15/Sep/23 ]

"Sebastien Buisson <sbuisson@ddn.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52395
Subject: LU-16498 obdclass: change uc_lock to rwlock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fb9d9b40f979140718799427e6f13234a4cbce80

Generated at Sat Feb 10 03:27:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.