[LU-8417] setstripe -o does not work on directories Created: 20/Jul/16  Updated: 30/Nov/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Gary Hagensen (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: easy

Issue Links:
Related
is related to LU-4665 utils: lfs setstripe to specify OSTs Resolved
is related to LU-9 Optimize weighted QOS Round-Robin all... Open
is related to LU-6135 improved support for selecting specif... Resolved
is related to LU-5170 lfs usability Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

the -o option to setstripe where you can specify the OSTs to use works on files but gets an error when you do the same thing on directories.

[root@Lustre-TG1 lustrefs]# mkdir testdir
[root@Lustre-TG1 lustrefs]# lfs setstripe -o 0-3 -c 4 testfile
[root@Lustre-TG1 lustrefs]# lfs getstripe testfile
testfile
lmm_stripe_count:   4
lmm_stripe_size:    1048576
lmm_pattern:        1
lmm_layout_gen:     0
lmm_stripe_offset:  0
        obdidx           objid           objid           group
             0            9925         0x26c5                0
             1            1606          0x646                0
             2            1608          0x648                0
             3            1609          0x649                0

[root@Lustre-TG1 lustrefs]# lfs setstripe -o 0-3 -c 4 testdir
error on ioctl 0x4008669a for 'testdir' (3): Invalid argument
error: setstripe: create stripe file 'testdir' failed


 Comments   
Comment by Andreas Dilger [ 20/Jul/16 ]

This would need to store the full uninitialized LOV EA on the MDS directory as the template for the file layout. Based on discussion with Gary, this is needed for testing OSS performance. For "real world" usage, it would probably be a "poor man's OST pools" in the sense that it would be possible to specify a list of OSTs, and then a stripe count less than the total OST count, and it should be possible to select a subset of OSTs to create files on.

Comment by Andreas Dilger [ 20/Jul/16 ]

One issue with using "-o ... -c" to implement "temporary pools" is that there is no place (currently) to store the object allocation state across file creates as there is with a proper pool, so it can't be smart about round-robin allocation (e.g. 0+1, 2+3, 4+5, 6+7, 1+2, 3+4, ... to track the last OST index used and to avoid having stripe 0 on the same OST repeatedly).

The MDS would somehow need to dynamically allocate an internal allocation state based on the specified OST list and then keep that in memory for some time to handle allocations with the same group of OSTs. It is OK if the same OST list is shared by different directories, since the file creation and IO load on the OSTs is also shared. There is no significant issue if the allocation state is dropped after some idle time (e.g. a couple of minutes), since the load on the OSTs is also transient.

Ideally this would be implemented as part of LU-9 so that the imbalance of file creates on a subset of OSTs does not cause global imbalance across other OSTs.

Comment by Andreas Dilger [ 20/Jul/16 ]

Link to original LU-4665 ticket that introduced this feature.

Comment by Andreas Dilger [ 07/Nov/17 ]

It looks like https://review.whamcloud.com/12275 implements this to some extent.

Comment by Andreas Dilger [ 30/Nov/23 ]

It looks like specifying explicit OSTs is working since at least 2.14:

# lfs setstripe -o 2,3,2,3 /mnt/testfs/specific
# touch /mnt/testfs/specific/fff
# lfs getstripe --yaml /mnt/testfs/specific
stripe_count:  4
stripe_size:   1048576
pattern:       raid0,overstriped
stripe_offset: 2

lmm_stripe_count:  4
lmm_stripe_size:   1048576
lmm_pattern:       raid0,overstriped
lmm_layout_gen:    0
lmm_stripe_offset: 2
lmm_objects:
      - l_ost_idx: 2
        l_fid:     0x380000401:0x79:0x0
      - l_ost_idx: 3
        l_fid:     0x3c0000401:0x79:0x0
      - l_ost_idx: 2
        l_fid:     0x380000401:0x7a:0x0
      - l_ost_idx: 3
        l_fid:     0x3c0000401:0x7a:0x0

The one remaining issue is that "lfs getstripe" on the directory does not print the specific layout properly. It should print the specific OST indices and not just the stripe count.

Generated at Sat Feb 10 02:17:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.