[LU-12273] DNE3: Metadata overstriping Created: 08/May/19 Updated: 22/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Minor |
| Reporter: | Patrick Farrell (Inactive) | Assignee: | Patrick Farrell |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | dne3 | ||
| Issue Links: |
|
||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
"it allows more concurrency on the MDT, exceeding single-directory size limitations, directory migration/compaction, etc." (per Andreas) This exists in limited form today, accessible with a fail loc: Which is used in sanity test 300k to put a bunch of stripes on MDT0: #define OBD_FAIL_LARGE_STRIPE 0x1703
$LCTL set_param fail_loc=0x1703
$LFS setdirstripe -i 0 -c192 $DIR/$tdir/striped_dir ||
error "set striped dir err
Actually doing this as a feature requires various other enabling changes, but this test shows it should be possible. It's also possible to use the method in this test to create temporary setups for benchmarking this idea to confirm it's worth pursuing. |
| Comments |
| Comment by Gerrit Updater [ 02/Jun/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35034 |
| Comment by Gerrit Updater [ 27/Aug/19 ] |
|
Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35939 |
| Comment by Andreas Dilger [ 21/Oct/22 ] |
Correct - with overstriping there is still only a single journal/device and some filesystem locks, while two separate MDTs have totally separate infrastructure (but each one is 1/2 the size and needs more space balancing, double journal memory usage). If we can get "close" performance with 2x or 4x overstriping vs. 2x or 4x MDTs then using directory overstriping would be better overall. |
| Comment by Gerrit Updater [ 19/Jan/23 ] |
|
"Patrick Farrell <farr0186@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49707 |
| Comment by Patrick Farrell [ 23/Jan/23 ] |
|
So this is fascinating. Each MDT does not know about itself in the pool code, because it's the local device, so it's handled differently. The practical result of this is the first MDT is not selected by the allocation code on the MDT, so it only eg, with 2 MDTs: lmv_stripe_count: 8 lmv_stripe_offset: 1 lmv_hash_type: crush,overstriped Or, with 4 MDTs, it can look like this: Notice 3 is only used once. Allocation of the first stripe is handled like this, without reference to the pool: stripes[0] = dt_locate_at(env, lod->lod_child, &fid, then the qos/rr alloc code is called to allocate the rest of the stripes. I'm not sure what to do about this - The device init process doesn't really seem something to mess with. Basically, add one more to the range of indices that can be selected during RR, and if it's found, then do I'll do that if there's not an objection. |
| Comment by Gerrit Updater [ 19/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49707/ |