Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16938

"lfs setstripe -C -1" stripes too widely, should be limited to OST_COUNT

Details

    • Improvement
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      I am reaching out to seek clarification regarding the expected behavior of the "lfs setstripe" command when using the -C -1 option.

      Currently, it appears that this command is creating a higher stripe count than anticipated. For instance, on my test system, it generated a stripe count of 2727 for a single file. This count exceeds the allowed limit of LOV_MAX_STRIPE_COUNT. 

      I am uncertain about the appropriate solution to address this issue related to the "-1" argument. I have contemplated the following options:

      1.    Consider making the option -1 illegal, preventing its usage altogether.

      2.    Implement a mechanism to automatically set the stripe count to the maximum allowed value (LOV_MAX_STRIPE_COUNT) if the count exceeds this limit.

      I would greatly appreciate your input and guidance in this matter. It is worth noting that setting the stripe count higher than LOV_MAX_STRIPE_COUNT leads to other problems, such as the failure of the "llapi_layout_get_by_fd" API to open the file.

      Please let me know your input.

      Attachments

        Issue Links

          Activity

            [LU-16938] "lfs setstripe -C -1" stripes too widely, should be limited to OST_COUNT
            jschwartz Josh Schwartz added a comment - - edited

            I don't think that is coming into play here because I'm just showing the default striping on a directory. If I actually create a file within the directory I believe it is behaving as you suggest:

            jupiter-p2:/lus/kjcf08 # mkdir test
            jupiter-p2:/lus/kjcf08 # lfs setstripe --overstripe-count 1024 --stripe-count 10 test
            jupiter-p2:/lus/kjcf08 # touch test/foo
            jupiter-p2:/lus/kjcf08 # lfs getstripe test | head
            test
            stripe_count:  10 stripe_size:   1048576 pattern:       raid0,overstriped stripe_offset: -1
            
            test/foo
            lmm_stripe_count:  10
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0,overstriped
            lmm_layout_gen:    0
            lmm_stripe_offset: 1
            	obdidx		 objid		 objid		 group
            

            here the file is overstriped because I only have 2 OSTs.

            This is a bit of a degenerative example, but if I just set the --overstripe-count 2 the directory will have a default of overstriped with a stripe count of 2, but files that are created are not overstriped (and have a stripe count of 2):

            jupiter-p2:/lus/kjcf08 # lfs setstripe --overstripe-count 2 test
            jupiter-p2:/lus/kjcf08 # lfs getstripe -d test
            stripe_count:  2 stripe_size:   1048576 pattern:       raid0,overstriped stripe_offset: -1
            jupiter-p2:/lus/kjcf08 # touch test/foo
            jupiter-p2:/lus/kjcf08 # lfs getstripe test/foo
            test/foo
            lmm_stripe_count:  2
            lmm_stripe_size:   1048576
            lmm_pattern:       raid0
            lmm_layout_gen:    0
            lmm_stripe_offset: 1
            	obdidx		 objid		 objid		 group
            	     1	     116959791	    0x6f8aa2f	             0
            	     0	     117253333	    0x6fd24d5	             0
            

            so I think that part of it is working OK.

            jschwartz Josh Schwartz added a comment - - edited I don't think that is coming into play here because I'm just showing the default striping on a directory. If I actually create a file within the directory I believe it is behaving as you suggest: jupiter-p2:/lus/kjcf08 # mkdir test jupiter-p2:/lus/kjcf08 # lfs setstripe --overstripe-count 1024 --stripe-count 10 test jupiter-p2:/lus/kjcf08 # touch test/foo jupiter-p2:/lus/kjcf08 # lfs getstripe test | head test stripe_count: 10 stripe_size: 1048576 pattern: raid0,overstriped stripe_offset: -1 test/foo lmm_stripe_count: 10 lmm_stripe_size: 1048576 lmm_pattern: raid0,overstriped lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group here the file is overstriped because I only have 2 OSTs. This is a bit of a degenerative example, but if I just set the --overstripe-count 2 the directory will have a default of overstriped with a stripe count of 2, but files that are created are not overstriped (and have a stripe count of 2): jupiter-p2:/lus/kjcf08 # lfs setstripe --overstripe-count 2 test jupiter-p2:/lus/kjcf08 # lfs getstripe -d test stripe_count: 2 stripe_size: 1048576 pattern: raid0,overstriped stripe_offset: -1 jupiter-p2:/lus/kjcf08 # touch test/foo jupiter-p2:/lus/kjcf08 # lfs getstripe test/foo test/foo lmm_stripe_count: 2 lmm_stripe_size: 1048576 lmm_pattern: raid0 lmm_layout_gen: 0 lmm_stripe_offset: 1 obdidx objid objid group 1 116959791 0x6f8aa2f 0 0 117253333 0x6fd24d5 0 so I think that part of it is working OK.

            There is code in lod_ost_alloc_rr() in the MDS object allocation that should be removing the LOV_PATTERN_OVERSTRIPING flag if it is set unnecessarily:

                    /* If there are enough OSTs, a component with overstriping requested
                     * will not actually end up overstriped.  The comp should reflect this.
                     */
                    if (!overstriped)
                            lod_comp->llc_pattern &= ~LOV_PATTERN_OVERSTRIPING;
            

            If this isn't being applied consistently, then that would be a bug.

            adilger Andreas Dilger added a comment - There is code in lod_ost_alloc_rr() in the MDS object allocation that should be removing the LOV_PATTERN_OVERSTRIPING flag if it is set unnecessarily: /* If there are enough OSTs, a component with overstriping requested * will not actually end up overstriped. The comp should reflect this . */ if (!overstriped) lod_comp->llc_pattern &= ~LOV_PATTERN_OVERSTRIPING; If this isn't being applied consistently, then that would be a bug.

            Josh,

            Makes sense to me.  There's also another possible bug here - how many OSTs do you have on that system?  If it's >= 10, then overstriped shouldn't be set by the server code either, which is also a concern.  Overstriping should only be set on the file when the actual file striping exceeds the number of available OSTs.  (Or at least that was the intent...)

            So there may be two things to fix there - proper overriding by later parameters in userspace, so the overstriping flag isn't passed along, and then - if you have >= 10 OSTs, then the server shouldn't set the overstriping pattern regardless of what userspace asked for.  If you have 20 OSTs and give -C 10, overstriping shouldn't be set, because the file is not actually overstriped.  Overstriping set on a not-overstriped file isn't fatal, but it's definitely wrong.

            paf0186 Patrick Farrell added a comment - Josh, Makes sense to me.  There's also another possible bug here - how many OSTs do you have on that system?  If it's >= 10, then overstriped shouldn't be set by the server code either, which is also a concern.  Overstriping should only be set on the file when the actual file striping exceeds the number of available OSTs.  (Or at least that was the intent...) So there may be two things to fix there - proper overriding by later parameters in userspace, so the overstriping flag isn't passed along, and then - if you have >= 10 OSTs, then the server shouldn't set the overstriping pattern regardless of what userspace asked for.  If you have 20 OSTs and give -C 10, overstriping shouldn't be set, because the file is not actually overstriped.  Overstriping set on a not-overstriped file isn't fatal, but it's definitely wrong.
            jschwartz Josh Schwartz added a comment - - edited

            > Like many utilities, the last option specified will take precedence.

            I would be fine with either (mutually exclusive or last takes precedence in its entirety) but this bothers me:

            jupiter-p2:/lus/kjcf08 # mkdir test
            jupiter-p2:/lus/kjcf08 # lfs setstripe --overstripe-count 1024 --stripe-count 10 test
            jupiter-p2:/lus/kjcf08 # lfs getstripe -d test
            test
            stripe_count:  10 stripe_size:   1048576 pattern:       raid0,overstriped stripe_offset: -1
            

            Note that we got (and kept) overstriped from the first param, but then picked up the count from the second. If the last option truly took precedence I would expect a stripe count of 10 without overstriped (just like if the first one took precedence I would expect a stripe count of 1024 with overstriped).

            It is inconsistent that the behavior is different if you issue them individually, but in the same order:

            jupiter-p2:/lus/kjcf08 # lfs setstripe --overstripe-count 1024 test
            jupiter-p2:/lus/kjcf08 # lfs getstripe -d test
            stripe_count:  1024 stripe_size:   1048576 pattern:       raid0,overstriped stripe_offset: -1
            jupiter-p2:/lus/kjcf08 # lfs setstripe --stripe-count 10 test
            jupiter-p2:/lus/kjcf08 # lfs getstripe -d test
            stripe_count:  10 stripe_size:   1048576 pattern:       raid0 stripe_offset: -1
            

            Here each command does as I would expect; --overstripe-count 1024 by itself yields overstriped and stripe count 1024, and --stripe-count 10 by itself on the same directory removes overstriped (which is what I would expect) yielding stripe count 10 without overstriped.

            The fact that combining them causes it to take the overstriped from the first param and the stripe count from the second is surprising. --stripe-count explicitly means not-overstriped and if the rule is that the last one takes precedence, then it should be like the --overstripe-count wasn't there at all instead of the --stripe-count acting as a modifier.

            jschwartz Josh Schwartz added a comment - - edited > Like many utilities, the last option specified will take precedence. I would be fine with either (mutually exclusive or last takes precedence in its entirety ) but this bothers me: jupiter-p2:/lus/kjcf08 # mkdir test jupiter-p2:/lus/kjcf08 # lfs setstripe --overstripe-count 1024 --stripe-count 10 test jupiter-p2:/lus/kjcf08 # lfs getstripe -d test test stripe_count: 10 stripe_size: 1048576 pattern: raid0,overstriped stripe_offset: -1 Note that we got (and kept) overstriped from the first param, but then picked up the count from the second. If the last option truly took precedence I would expect a stripe count of 10 without overstriped (just like if the first one took precedence I would expect a stripe count of 1024 with overstriped). It is inconsistent that the behavior is different if you issue them individually, but in the same order: jupiter-p2:/lus/kjcf08 # lfs setstripe --overstripe-count 1024 test jupiter-p2:/lus/kjcf08 # lfs getstripe -d test stripe_count: 1024 stripe_size: 1048576 pattern: raid0,overstriped stripe_offset: -1 jupiter-p2:/lus/kjcf08 # lfs setstripe --stripe-count 10 test jupiter-p2:/lus/kjcf08 # lfs getstripe -d test stripe_count: 10 stripe_size: 1048576 pattern: raid0 stripe_offset: -1 Here each command does as I would expect; --overstripe-count 1024 by itself yields overstriped and stripe count 1024, and --stripe-count 10 by itself on the same directory removes overstriped (which is what I would expect) yielding stripe count 10 without overstriped. The fact that combining them causes it to take the overstriped from the first param and the stripe count from the second is surprising. --stripe-count explicitly means not-overstriped and if the rule is that the last one takes precedence, then it should be like the --overstripe-count wasn't there at all instead of the --stripe-count acting as a modifier.

            Interesting, OK!  Happy to defer.  I wasn't familiar with "last option takes precedence".

            paf0186 Patrick Farrell added a comment - Interesting, OK!  Happy to defer.  I wasn't familiar with "last option takes precedence".

            Actually, I will disagree with Patrick here. While "-C M" enables overstriping, it works with a stripe count that is less than the number of OSTs < M, equivalent to "-c M" in that case. I don't think it "conflicts" with a later "-c N" option. Like many utilities, the last option specified will take precedence.

            adilger Andreas Dilger added a comment - Actually, I will disagree with Patrick here. While "-C M" enables overstriping, it works with a stripe count that is less than the number of OSTs < M, equivalent to "-c M" in that case. I don't think it "conflicts" with a later "-c N" option. Like many utilities, the last option specified will take precedence.

            It would be OK to fix that as a separate patch on this ticket, it's small enough it doesn't need a new ticket.  But it should be a separate patch, that's all.

            paf0186 Patrick Farrell added a comment - It would be OK to fix that as a separate patch on this ticket, it's small enough it doesn't need a new ticket.  But it should be a separate patch, that's all.

            Ok I will just take care of n*ostcount as part of this ticket. Thanks Patrick

            rajeevm Rajeev Mishra added a comment - Ok I will just take care of n*ostcount as part of this ticket. Thanks Patrick

            Interesting!  I think we should make them mutually exclusive, yes.  I would do that in a separate patch from the one which adds '-2, -3' functionality.

            paf0186 Patrick Farrell added a comment - Interesting!  I think we should make them mutually exclusive, yes.  I would do that in a separate patch from the one which adds '-2, -3' functionality.

            I'm currently reviewing the options -c, -C, overstripe-count, and stripe-count to gain a better understanding of their behavior.

            I've noticed that there's a bug where the command accepts all of these options simultaneously, even though it appears that they should be mutually exclusive. Presently, all flags can be used together, as demonstrated in the following example:

            [root@test2-rocky8 jbs]# lfs setstripe --overstripe-count 1024 --stripe-count -1 -c 10 -C -1 test

            [root@test2-rocky8 jbs]# lfs getstripe test | more

            test

            lmm_stripe_count:  2000

            lmm_stripe_size:   1048576

            lmm_pattern:       raid0,overstriped

            lmm_layout_gen:    0

             

            The documentation for these options are as follows:

             

                -c, --stripe-count <stripe_count>: Specifies the nu    mber of OSTs to stripe a file over. A value of 0 means to use the filesystem-wide default stripe count (default is 1), and -1 means to stripe over all available OSTs.

             

                -C, --overstripe-count <stripe_count>: Specifies the number of stripes to create, creating more than one stripe per OST if the count exceeds the number of OSTs in the file system. Similar to -c, 0 uses the filesystem-wide default stripe count (default is 1), and -1 means to stripe over all available OSTs.

                

            Now, the question arises: should we consider making these options mutually exclusive? The reason for this consideration is that the behavior of -c is essentially the same as using -C,. If we allow mutual inclusion, we would need to define the precedence of these options when used together.

            Your feedback and input on whether or not we should make these options mutually exclusive would be greatly appreciated.

            The fix will stick with the pattern mentioned in the comment above, which means it will use -1, -2, -3, and so on. In simpler terms, the stripe count will be calculated as a multiple of the OST count, up to the maximum stripe count allowed.

            *~ *                                                                                                     

            rajeevm Rajeev Mishra added a comment - I'm currently reviewing the options -c, -C, overstripe-count, and stripe-count to gain a better understanding of their behavior. I've noticed that there's a bug where the command accepts all of these options simultaneously, even though it appears that they should be mutually exclusive. Presently, all flags can be used together, as demonstrated in the following example: [root@test2-rocky8 jbs] # lfs setstripe --overstripe-count 1024 --stripe-count -1 -c 10 -C -1 test [root@test2-rocky8 jbs] # lfs getstripe test | more test lmm_stripe_count:  2000 lmm_stripe_size:   1048576 lmm_pattern:       raid0,overstriped lmm_layout_gen:    0   The documentation for these options are as follows:       -c, --stripe-count <stripe_count>: Specifies the nu    mber of OSTs to stripe a file over. A value of 0 means to use the filesystem-wide default stripe count (default is 1), and -1 means to stripe over all available OSTs.       -C, --overstripe-count <stripe_count>: Specifies the number of stripes to create, creating more than one stripe per OST if the count exceeds the number of OSTs in the file system. Similar to -c, 0 uses the filesystem-wide default stripe count (default is 1), and -1 means to stripe over all available OSTs.      Now, the question arises: should we consider making these options mutually exclusive? The reason for this consideration is that the behavior of -c is essentially the same as using -C,. If we allow mutual inclusion, we would need to define the precedence of these options when used together. Your feedback and input on whether or not we should make these options mutually exclusive would be greatly appreciated. The fix will stick with the pattern mentioned in the comment above, which means it will use -1, -2, -3, and so on. In simpler terms, the stripe count will be calculated as a multiple of the OST count, up to the maximum stripe count allowed. *~ *                                                                                                     

            Since the core "-C -1" issues were already fixed by LU-13748 and LU-16623, I changed this issue to track the improvement for mapping "-C -1" to use OST_COUNT like "-c -1", and "-C -2" to use "2 * OST_COUNT", etc.

            adilger Andreas Dilger added a comment - Since the core " -C -1 " issues were already fixed by LU-13748 and LU-16623 , I changed this issue to track the improvement for mapping " -C -1 " to use OST_COUNT like " -c -1 ", and " -C -2 " to use " 2 * OST_COUNT ", etc.

            People

              rajeevm Rajeev Mishra
              rajeevm Rajeev Mishra
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: