[LU-9846] Overstriping - more than stripe per OST per component Created: 08/Aug/17  Updated: 09/Oct/21  Resolved: 01/Jun/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0

Type: New Feature Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Patrick Farrell (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Attachments: PDF File CUG2019-Lustre_Overstriping-Farrell.pdf     PDF File LUG2019-Lustre_Overstriping_Shared_Write_Performance-Farrell.pdf    
Issue Links:
Related
is related to LU-11690 LBUG with very wide striping: lod_ea_... Resolved
is related to LU-10070 PFL self-extending file layout Resolved
is related to LU-1658 Review consistantly fails when runnin... Resolved
is related to LU-11784 PFL layouts can exceed EA size limits Resolved
is related to LU-11868 ZFS ea size limited to 32K Resolved
is related to LU-12273 DNE3: Metadata overstriping Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

'Overstriping' is my term for allowing more than one stripe of a particular file component to be placed on a particular OST.

Justification:
Lock ahead is designed to address the case were we are limited to a single shared file, but each OST is significantly faster than one client. In a shared file situation, LDLM locking behavior limits us to writing with one client to each OST, so we are unable to fully drive each OST for the shared file. This is an interesting enough case for Cray to drive all the work in lock ahead use a library to achieve it.

If we can put multiple stripes of the file on a single OST, we can essentially achieve the same thing, with far less effort. For a variety of reasons, this doesn't remove the need for lockahead (Primarily because we cannot necessarily redefine file striping when we want to write to it), but it is much simpler, and highly desirable for that reason. In addition to the MPIIO aggregation case where we have well controlled I/O and are trying to maximize OST utilization, adding more stripes to a shared file also helps in cases where I/O is poorly controlled, so there are effectively more locks for the badly behaved writers to contend for.

So, in short, I think it would be very, very desirable if, in a controlled manner, we could ask for more than one stripe to be on a given OST. A simple example is something like "8 stripes but only on these 2 OSTs", giving 4 stripes per OST (and allowing 4 client writers per OST with no fancy locking work).

A note about PFL:
While pfl lets you create separate components using the same OST, that's not a viable solution for the case I'm looking for, which is just > 1 stripe on a given OST. The idea being to write to all stripes in parallel, but wanting > 1 writer to be pointed at each OST. PFL would require a neverending series of components to do that.

Patch to implement this follows.



 Comments   
Comment by Gerrit Updater [ 08/Aug/17 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/28425
Subject: LU-9846 lod: Add overstriping support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5000bef6b28c3ed8de6058a935f6d22153a71510

Comment by Gerrit Updater [ 14/Dec/18 ]

Patrick Farrell (paf@cray.com) uploaded a new patch: https://review.whamcloud.com/33871
Subject: LU-9846 lod: Raise stripe count limit to 10K
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ccf5aa7d2dae44ad4657aec8181a379f574e7de3

Comment by Patrick Farrell (Inactive) [ 14/Dec/18 ]

Decided to split out raising the stripe count limit from the feature patch, as they're different and may cause different issues.

Both still depend on landing https://jira.whamcloud.com/browse/LU-11690 first.

Comment by Gerrit Updater [ 05/Feb/19 ]

Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34189
Subject: LU-9846 tests: Overstriping test
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 637dce43039fc94c5b1695a395f02f61f28162c9

Comment by Gerrit Updater [ 23/Apr/19 ]

Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34743
Subject: LU-9846 obd: Add overstriping CONNECT flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1b17db2eee420e32944e8668c85c92d766fd6d82

Comment by Gerrit Updater [ 04/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34743/
Subject: LU-9846 obd: Add overstriping CONNECT flag
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 5d085745af43bd15e6b7ea728491600411833b2a

Comment by Gerrit Updater [ 01/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28425/
Subject: LU-9846 lod: Add overstriping support
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 591a9b4cebc510ff51f0fdb944e5a81f08fdaf62

Comment by Peter Jones [ 01/Jun/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 06/Jun/19 ]

Vitaly Fertman (c17818@cray.com) uploaded a new patch: https://review.whamcloud.com/35089
Subject: LU-9846 test: a test number fix
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 50e4b01c8c0b0bfa4a20b5c421f04dd974274527

Comment by Gerrit Updater [ 07/Jun/19 ]

Lai Siyao (lai.siyao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35095
Subject: LU-9846 utils: hash may be overridden in 'lfs setdirstripe'
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 44a3f74059235aac9428311fb8cb1c4e44a6babf

Comment by Gerrit Updater [ 13/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35095/
Subject: LU-9846 utils: hash may be overridden in 'lfs setdirstripe'
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 44405d4d6f7c0a77d540d6a114050ccabd0a4e9f

Comment by Gerrit Updater [ 16/Jun/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35089/
Subject: LU-9846 test: a test number fix
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 360cb33fccd2fc7a0dc392afbb780ac1284b403a

Generated at Sat Feb 10 02:29:48 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.