Loading...

XML

Word

Printable

Details

Type: Improvement
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Rank (Obsolete):
9223372036854775807

Description

When doing DIO at ~10 GiB/s (see ~~LU-13798~~, ~~LU-13799~~), about 70-75% of the time is still spent on working with the cl_page struct.

This means allocating it, setting it up, and then moving it around & managing it. We use the cl_page to track the vm pages, and in doing so, we put it on lists and move it from list to list, and update the state of the cl_page... (literally, cl_page_state)

It's possible to improve this by doing cl_page allocations in batch, this results in roughly a 30% drop in time spent in cl_page work, and makes it possible to get close to 15 GiB/s.

Fundamentally, none of this is necessary for DIO. The cl_page struct is for tracking per-page information, but all of the pages in a DIO submission (at the ll_direct_rw_page level) are the same - They have the same owner, the same page flags, they are part of the same stripe... If we do unaligned DIO, the first and last page can have a starting & ending offset, but that's it, and we can associate that with the DIO itself, not the individual pages.

So the proposal is to switch from using the cl_page struct to track pages in a DIO, and instead use the array of pages which describes the user buffer (ie, the kiocb and the results of ll_get_user_pages).

The brw_page member of the cl_page struct seems like it will still be necessary, but this isn't such a big deal - We can allocate those separately, at a fraction of the cost of setting up and managing the full cl_page abstraction.

Back of the envelope calculations suggest that this would save about 60-75% of the time in submitting DIO in the current optimized path, which performs at 10 GiB/s.

That calculation suggests we could reach single threaded DIO performance in the 25-40 GiB/s range. Presumably some other issues will prevent hitting such high rates, but I think it is reasonable to think we could reach 20+ GiB/s, with sufficient network hardware. (We will likely have to accept "idle CPU time in the submitting thread while waiting for the network" as a proxy indicator, since networks in the 30 GiB/s/node range are not readily available for testing.)

This improvement would of course also apply to buffered i/o via this path (see LU-13805), with the fast buffering version seeing a smaller benefit (but still large).

This change would also likely make it easier (from a coding perspective) to move the buffer allocation & memcopy() in the ptlrpcd threads, which is a key part of improving the performance of the fast buffering.

Attachments

Issue Links

is related to

LU-13802 New i/o path: Buffered i/o as DIO

Open

LU-13799 DIO/AIO efficiency improvements

Resolved

LU-13805 i/o path: Unaligned direct i/o

Open

LU-17885 use transfer page arrays for DIO RPCs

Open

LU-13798 Improve direct i/o performance with multiple stripes: Submit all stripes of a DIO and then wait

Resolved

LU-17194 parallelize DIO submit

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(1 is related to, 47 mentioned in)

Activity

People

Assignee:: Patrick Farrell

Reporter:: Patrick Farrell

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 22/Jul/20 4:59 PM

Updated:: 07/Nov/25 9:45 AM