[LU-247] Lustre client slow performance on BG/P IONs: unaligned DIRECT_IO Created: 28/Apr/11  Updated: 17/Mar/22  Resolved: 17/Mar/22

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.0.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Niu Yawei (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-12429 Single client buffered SSF write is s... Open
is related to LU-4664 sync write should consume grant on cl... Resolved
is related to LU-4198 Improve IO performance when using DIR... Resolved
is related to LU-3192 LBUG:(osc_request.c:1308:osc_brw_prep... Resolved
is related to LU-13805 i/o path: Unaligned direct i/o Open
Severity: 3
Bugzilla ID: 18,801
Rank (Obsolete): 9105

 Description   

Port the fix of bug 18801 to 2.1



 Comments   
Comment by Niu Yawei (Inactive) [ 04/May/11 ]

I think there are two major components in this porting:

1. Make bufferred lockless io into direct lockless io.

Current 2.0 code supports bufferred lockless io, we should make them into direct io, since:

  • Partial write could cause data corrption: a partial write should make page uptodate by read page in prepare_write, since
    it's lockless io, the cached page isn't protected by dlm lock, then another client could change other part of this page
    just after prior client's prepare_write, when the prior client flush this page, the change made by the 2nd client will
    be overwritten by stale data.
  • In fact in the bufferred lockless io case, the page never been cached (it can't be cached), so it doesn't make any sense
    to copy data between user page and kernel cache, as shown in b18801, the overhead of such copy sometimes is quite heavy.

2. Support unaligned direct io.

To support unaligned direct io in 2.0, I think we'd better build io request based on user pages and file offset directly,
that should be similar to obd_brw api does. In my opinion, direct io doesn't fit those cl_xxx apis very well (they are
designed for bufferred io, the page offset is tightly binding with file offset, I think), and I don't see what's the
benefit of maitaining a raidx tree for those never cached user pages.

Jay, Oleg
Does my propsal sound ok to you? Thanks.

Comment by Jinshan Xiong (Inactive) [ 04/May/11 ]

Hi Niu,

Thanks for the points. Yes, this is a good chance for us to fix the directIO problem in one shot.

I still tend to do this upon the infrastructure of clio. Also the reason to have a radix tree in clio is for porting purpose - we've done a lot of work to decouple linux vfs/vm in both MDT and client stack.

Here is my idea: let's fix CPT_TRANSIENT page implementation to have arbitrary buffers for this kind of pages. However, this will not fit into cl_page cache any more which means stale data may be read if applications are doing regular IO and directIO on the same node. This may violate posix semantics a little bit, but I tend to think this is fine since linux kernel is doing the same thing.

I'm going to compose a draft patch for this. It should be easy.

Comment by Jinshan Xiong (Inactive) [ 05/May/11 ]

Hi Niu,

I composed a patch to bypass cl_page cache for TRANSIENT pages at: http://review.whamcloud.com/#change,495, just for a reference.

Comment by Jinshan Xiong (Inactive) [ 05/May/11 ]

I realize we should give it a try because this may be not a problem in clio. Clio has a finer grained lockless IO control, it can do lockless IO per ost object, instead of per file as b18.

Comment by Oleg Drokin [ 05/May/11 ]

Frankly I am not even sure why do we need radix trees or anything like that for directio or otherwise uncached pages.
It's a purely one-shot job anyway.

Lai, your description of patchless io partial page write in clio where unlocked page is marked as clean sounds totally broken.
Good thing there is no easy way to trigger this mode or we'd have another blocker on our hands.

Comment by Peter Jones [ 17/Jun/11 ]

I think that we will defer this task for now

Comment by Niu Yawei (Inactive) [ 20/Jun/11 ]

Patch is at http://review.whamcloud.com/980

Comment by Niu Yawei (Inactive) [ 28/Nov/11 ]

The patch in http://review.whamcloud.com/980 uses obd_brw() API in osc_io_dio(), since it requires less code changes and easyer for landing, however, the best way to implement osc_io_dio() should be: introduce a new API which can naturally accept (file offset, bytes, offset in first page, incontiguous pages) and build io requests based on these information directly (brw_page should not be used for direct io, 'pshift' hacking should be eliminated), and osc_io_dio() calls the new API to build io reqeusts and sends them asynchronously.

The proper implementaion requires more code changes, let's fix it in a follow-up patch.

Comment by Andreas Dilger [ 16/Sep/13 ]

I don't like the idea of adding new users of the obd_brw() APIs. These are pretty much obsolete on both the client and server, and we should be removing them entirely instead of adding new users.

Comment by Cory Spitz [ 26/Sep/13 ]

If we follow Niu's suggestion to land change #980 and then follow it up, can it land now, or must we wait until after we branch for b2_5?

Comment by Patrick Farrell [ 17/Mar/22 ]

See LU-13805

Generated at Sat Feb 10 01:05:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.