Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
When doing writes to many files, one bottleneck on a client
currently seems to be the grant code, specifically
spinning in the lock around:
osc_enter_cache_try
The contention is just on osc_enter_cache_try, so there's
no obvious way to refactor the lock, etc. Instead, we can
look at where time is going in the function.
Two things that stand out:
obd_dirty_pages is an atomic, but it is always accessed
under the cl_loi_list_lock, so it can be a regular long.
In my perf tracing, the add_return to this is 50% of the
time in this function.
The assert_spin_lock in osc_consume_write_grant generates
an atomic read of the cl_loi_list_lock lock value. This
isn't too painful, but it would be nice to cut it out of
the hot path. There is already a comment saying the
cl_loi_list_lock must be held, and this is considered
enough in most places in Lustre.
mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F
That's 36 processes on one client, writing to separate
files.
Before patch:
5942.34 MiB/s
After patch:
11541 MiB/s
Looking in perf, the change is huge:
I go from spending 60% of the time in osc_enter_cache_try
to less than 1%.
Attachments
Issue Links
- is related to
-
LU-13309 performance optimizations for brw
- Resolved