[LU-13419] Simplify osc_enter_cache_try - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
- patch
- performance

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

When doing writes to many files, one bottleneck on a client
currently seems to be the grant code, specifically
spinning in the lock around:
osc_enter_cache_try

The contention is just on osc_enter_cache_try, so there's
no obvious way to refactor the lock, etc. Instead, we can
look at where time is going in the function.

Two things that stand out:
obd_dirty_pages is an atomic, but it is always accessed
under the cl_loi_list_lock, so it can be a regular long.
In my perf tracing, the add_return to this is 50% of the
time in this function.

The assert_spin_lock in osc_consume_write_grant generates
an atomic read of the cl_loi_list_lock lock value. This
isn't too painful, but it would be nice to cut it out of
the hot path. There is already a comment saying the
cl_loi_list_lock must be held, and this is considered
enough in most places in Lustre.

mpirun -np 36 $IOR -o $LUSTRE -w -t 1M -b 2G -i 1 -F

That's 36 processes on one client, writing to separate
files.

Before patch:
5942.34 MiB/s
After patch:
11541 MiB/s

Looking in perf, the change is huge:
I go from spending 60% of the time in osc_enter_cache_try
to less than 1%.

Attachments

Issue Links

is related to

LU-13309 performance optimizations for brw

Resolved

Activity

People

Assignee:: Patrick Farrell

Reporter:: Patrick Farrell

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 07/Apr/20 3:54 PM

Updated:: 02/Jan/25 8:40 PM