[LU-10619] Tiny writes improvement: Size + glimpse changes Created: 06/Feb/18  Updated: 21/Jan/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Patrick Farrell (Inactive)
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9409 Lustre small IO write performance imp... Resolved
is related to LU-9627 Bad small-file behaviour even when lo... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The current tiny writes code (LU-9409) updates the size for each OSC object. Skipping this nets about a 15% performance improvement, probably more (maybe a lot more) for shared file writing.

As suggested by Jinshan in https://review.whamcloud.com/#/c/27903/, we could likely skip this by changing how size works on the client.

Specifically, we would update size only at the upper layer at the end of tiny writes - ll_tiny_write_end - storing the size in llite and doing i_size_write (because the kernel uses inode size outside of Lustre). We would have to rewrite osc_glimpse and possibly the getattr/setattr code to do the modified size handling correctly.

Notes from Jinshan:
"this still has significant overhead for tiny write. Have you thought about extending osc_refresh_count() -> cl_object_attr_get() to get size info from LLITE layer, therefore you don't need to call this function every single time.

about osc_refresh_count(): when writing the last page to the OST, it needs to figure out the file size so that it knows how much data is valid in the last page. This is why OSC needs to call the function to 'refresh' count. LLITE always has uptodate file size for write therefore it doesn't need to keep the object size uptodate at OSC layer.

Of course, if you would like to do it as I said, you need to fix osc_object_glimpse() too because the size in oinfo is not uptodate."



 Comments   
Comment by Patrick Farrell (Inactive) [ 08/Feb/18 ]

Reflecting on this, slightly cleaner thoughts:

I believe this means the only size we keep would be i_size, and rather than keeping size for each sub object, we would calculate it from i_size when needed. I'm a little bit concerned about possible effect on KMS and such, but I get the appeal - Many writes to one file definitely spend a lot of time updating all the attributes.

Comment by Patrick Farrell (Inactive) [ 09/Feb/18 ]

In osc_refresh_count, we use attr->cat_kms. This seems odd - Why don't we use attr->cat_size?

I am, to say the least, a little lost.

Generated at Sat Feb 10 02:36:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.