[LU-10465] increase default stripe size to 4MB Created: 08/Jan/18  Updated: 05/Jan/24  Resolved: 18/Nov/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.16.0

Type: Bug Priority: Critical
Reporter: Jian Yu Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: LTS12

Issue Links:
Duplicate
Related
is related to LU-10463 Poor write performance periodically o... Resolved
is related to LU-9090 increase default RPC size to 4MB Resolved
is related to LU-10786 sanity-flr test_45: Create /mnt/lustr... Resolved
is related to LU-17076 ptlrpc_nrs_req_stop_nolock() use afte... Resolved
is related to LU-10808 DoM: component end should align with ... Resolved
is related to LU-11918 Allow setting default file layout on ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Patch https://review.whamcloud.com/25336 for LU-9090 has increased the default OST BRW size to 4MB. Patch for this ticket will increase the default stripe size from 1MB to 4MB so that widely-striped files can generate full RPCs without pinning so much memory on the client.



 Comments   
Comment by Jian Yu [ 08/Jan/18 ]

Here is the patch to increase default stripe size to 4MB: https://review.whamcloud.com/27151

Comment by Jian Yu [ 08/Jan/18 ]

Hi Cliff,

Could you please test the above patch according to the following suggestion from Andreas?

Before this patch can be landed, we need to understand what kind of performance impact will be seen. Please have Cliff run a test on our system to verify e.g. IOR is running as fast or faster than before.

Thank you.

Comment by Cliff White (Inactive) [ 08/Jan/18 ]

Saurabh is doing performance testing now, we'll get this into the schedule.

Comment by Saurabh Tandan (Inactive) [ 08/Jan/18 ]

I will add this into my schedule and get back with results.

Comment by Andreas Dilger [ 08/Jan/18 ]

In light of LU-10463, I wonder if the default stripe size should be a function of the default RPC size?  The MDS could pick this up from the OSTs at connect time.  It might get more complex if there is a mix of ldiskfs and ZFS OSTs in a single system (though that is unlikely).

That said, I think it is less harmful if the default stripe size is larger than the RPC size, than if the stripe size is smaller than the RPC size.

Comment by Gerrit Updater [ 13/Feb/18 ]

Jian Yu (jian.yu@intel.com) uploaded a new patch: https://review.whamcloud.com/31292
Subject: LU-10465 tests: interoperate with 4MB stripe size server
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 2ec6a165cd23cdaeaf47ff17f09903450e866412

Comment by Andreas Dilger [ 22/Feb/18 ]

Ihara or Vitaly, do you have any performance test results from testing with the default stripe_size of 4MB (not the RPC size)? Do you already run with this in production? We're just looking to see if the default should be increased. It looks like a good improvement for ZFS, but our testing for ldiskfs is mixed, so it would be good to get some more feedback from more systems if possible.

Comment by Gerrit Updater [ 06/Mar/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27151/
Subject: LU-10465 lov: increase default stripe size to 4MB
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3f5abc6fa30e7c0256077ccf6a149d1809450465

Comment by Peter Jones [ 06/Mar/18 ]

Landed for 2.11

Comment by Andreas Dilger [ 08/Mar/18 ]

Due to problems related to LU-10786, we need to undo the 4MB default stripe size change until we have a better method of handling DoM components. Otherwise, it means that DoM files will not be created easily with default settings.

Rather than revert the whole patch, I would recommend to submit a new patch that is only changing the default stripe size, and leave the test fixes in place. That allows developers to specify different default stripe size without hitting unrelated failures, and simplifies testing in the future.

Once the patch to change the default stripe size back to 1MB has landed this ticket should be moved to 2.12.

Comment by Gerrit Updater [ 08/Mar/18 ]

Jian Yu (jian.yu@intel.com) uploaded a new patch: https://review.whamcloud.com/31589
Subject: LU-10465 lov: decrease default stripe size to 1MB
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fec94eb84185e5c790d41766f4cd9bd348e49a00

Comment by Gerrit Updater [ 12/Mar/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/31589/
Subject: LU-10465 lov: decrease default stripe size to 1MB
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b231e3ae5cf7df6abf1fbd683589f0fcff2de03f

Comment by Peter Jones [ 12/Mar/18 ]

We've backed off changing the default for 2.11

Comment by Andreas Dilger [ 08/Jul/19 ]

I think with LU-10808 fixed that we could try reintroducing this change (essentially revert https://review.whamcloud.com/31589 again).

Ihara, if you get a chance, could you please run a 4MB default stripe size for IOR FPP and SSF to see the performance impact.

Vitaly, it would also be good to know if this change improves or hurts performance on your systems before it becomes the default.

Comment by Gerrit Updater [ 23/Jan/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37318
Subject: Revert "LU-10465 lov: decrease default stripe size to 1MB"
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 973a7e7042eda757080884ead71468d575516402

Comment by Andreas Dilger [ 24/Jan/20 ]

Mike, it looks like there are still failures in the DoM tests when the default stripe size is changed to 4MB. Could you please take a look.
https://testing.whamcloud.com/test_sets/c361e79a-3e3d-11ea-bc97-52540065bddc

== sanity test 272c: DoM migration: DOM file to the OST-striped file (composite) ===================== 23:32:52 (1579822372)
lfs migrate: cannot create composite file '/mnt/lustre/d272c.sanity/.:VOLATILE:0000:5EB80E61': Invalid argument
error: lfs migrate: /mnt/lustre/d272c.sanity/f272c.sanity: cannot create volatile file: Operation not permitted
 sanity test_272c: @@@@@@ FAIL: failed to migrate to the new composite layout 
Comment by Mikhail Pershin [ 24/Jan/20 ]

Andreas, this happens because new component stripes are not aligned with an old ones. We have original file with 1MB DOM component and not defined second component, e.g. it become 4MB stripes by default. Test tries to migrate to PFL layout with 2MB as first component stripe and second component by default. With 1MB default it works, but with 4MB boundaries are not aligned. So test should be changed to migrate to the same first stripe size it seems.

Meanwhile I wonder, is that OK that components are not aligned by stripe size in a file? E.g. for original file we have [0, 1MB) for DOM and then 4MB stripes, so whole file has stripes as 1MB, 4MB, 4, ... and each 4MB stripe is not aligned at 4MB from the file beginning. I am not sure is that a problem or not, but wouldn't be better to adjust new component stripe sizes to be aligned with that component start?

Comment by Andreas Dilger [ 24/Jan/20 ]

The second chunk (start of second component, after DoM component) should just be a bit shorter, starting at 1MB and ending at 4MB, with a "hole" at the start where the DoM opponent is. Then the rest of the chunks in the second component would be properly sized/aligned at 4MB.

It is done this way so that eg. the DoM data could be written into the second component without having to move all the later data.

Comment by Mikhail Pershin [ 24/Jan/20 ]

Test failed due to that error:

(lod_lov.c:1934:lod_verify_striping()) stripe size isn't aligned, stripe_sz: 4194304, [0, 2097152)

So this check in lod_verify_striping() is not quite correct, isn't it? If original file has 1MB for first component and 3MB, 4MB ... for the second, then it can be migrated to 2MB for the first and then 2MB, 4MB, ... for the second component.
 

Comment by Mikhail Pershin [ 24/Jan/20 ]

Probably I see where problem is, test does the following:

$LFS migrate -E 2M -c1 -E -1 -c2 $dom

without implicitly set stripe size, LOD uses default size 4MB and fails because it is over whole component 2MB. Considering the user may not know default MDT stripe size, I'd say it is not his fault to use 2MB component, and either lfs or LOD should take care and reduce stripe size to the component size maybe?

Comment by Gerrit Updater [ 21/Feb/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/37661
Subject: LU-10465 lod: adjust component stripe size
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6b589349595efb4eba73b40c7e7e8f07c41bd8b3

Comment by Gerrit Updater [ 23/Apr/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/37661/
Subject: LU-10465 lod: adjust component stripe size
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e67a7f7b960a042ae7369e81e4365c6f7e095d25

Comment by Gerrit Updater [ 17/Sep/20 ]

Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39957
Subject: LU-10465 lod: adjust component stripe size
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: bf76de5430fa104b2992f40768cdf8c97699f1de

Comment by Gerrit Updater [ 23/Sep/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/37318/
Subject: LU-10465 lov: increase default stripe size to 4MB
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ea18d7da59d369f093e340e150544f51b2f229a1

Comment by Gerrit Updater [ 04/Nov/23 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52989
Subject: LU-10465 osd-ldiskfs: 8MiB IOs should bypass cache
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 22f24efa69e46eaad909403f3bb473b5d40cab32

Comment by Gerrit Updater [ 18/Nov/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52989/
Subject: LU-10465 osd-ldiskfs: 8MiB IOs should bypass cache
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f8e49e321ed81d77b204af40165f9ae2d07c5986

Comment by Peter Jones [ 18/Nov/23 ]

Landed for 2.16

Generated at Sat Feb 10 02:35:20 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.