Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
9223372036854775807
Description
A significant amount of time is sometimes spent during
lru clearing (IE, echo 'clear' > lru_size) checking
pages to see if they are covered by another read lock.
Since all unused read locks will be destroyed by this
operation, the pages will be freed momentarily anyway,
and this is mostly a waste of time.
(Time is spent specifically in ldlm_lock_match, trying to
match these locks.)
So, in the case of echo clear > lru_size, we should not
check for other covering read locks before attempting to
discard pages.
We do this by using the LDLM_FL_DISCARD_DATA flag, which is
currently used for special cases where you want to destroy
the dirty pages under a write lock rather than write them
out.
We set this flag on all the PR locks which are slated for
cancellation by ldlm_prepare_lru_list (when it is called
from ldlm_ns_drop_cache).
The case where another lock does cover those pages (and is
in use and so does not get cancelled) is safe for a few
reasons:
1. When discarding pages, we wait (discard_cb->cl_page_own)
until they are in the cached state before invalidating.
So if they are actively in use, we'll wait until that use
is done.
2. Removal of pages under a read lock is something that can
happen due to memory pressure, since these are VFS cache
pages. If a client reads something which is then removed
from the cache and goes to read it again, this will simply
generate a new read request.
This has a performance cost for that reader, but if anyone
is clearing the ldlm lru while actively doing I/O in that
namespace, then they cannot ask for good performance.
In the case of many read locks on a single resource, this
improves cleanup time dramatically. In internal testing at
Cray using unusual read/write I/O patterns to create
~80,000 read locks on a single file, this improves cleanup
time from ~60 seconds to ~0.5 seconds. This also slightly
improves cleanup speed in the more normal case of a 1 or
very few read locks on a file.
Attachments
Issue Links
- is related to
-
LU-7802 set_param lru_size fails with 'error: set_param: setting /proc/fs/lustre/ldlm/namespaces/lustre-OST0000-osc-*/lru_size=clear: Invalid argument'
- Resolved