Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5580

Switch between 'JOBID' and 'NID' directly in NRS TBF

Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.6.0
    • 15570

    Description

      When we want to change TBF based on NID to TBF based on UID (or reversely), we have to change the policy to FIFO (or any other policy) and to TBF based on UID. That's not natural. And 'tbf nid' directly won't give out any error message as if it succeeded. That's confusing. I will see whether it is possible to improve it.

      Attachments

        Issue Links

          Activity

            [LU-5580] Switch between 'JOBID' and 'NID' directly in NRS TBF
            pjones Peter Jones added a comment -

            Landed for 2.8

            pjones Peter Jones added a comment - Landed for 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11749/
            Subject: LU-5580 ptlrpc: policy switch directly in tbf
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e7ab554c1ca887e1a3fa9da5250b2debb4eee2d6

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11749/ Subject: LU-5580 ptlrpc: policy switch directly in tbf Project: fs/lustre-release Branch: master Current Patch Set: Commit: e7ab554c1ca887e1a3fa9da5250b2debb4eee2d6
            lixi Li Xi (Inactive) added a comment - - edited

            Also, I realized that the TBF policy is not only throttling RPC rate but also schedule RPCs between buckets. And if we use FID or OST index as the bucket key and then sort RPC inside the bucket, we can get a policy which looks like a combination of ORR/TRR and TBF. However, inside TBF, it is not exactly round robin scheduler, it is a deadline-base scheduler. The scheduler will choose the bucket which contains the RPC that has the earlest deadline time. I needs more time to analyse whether this scheduler is better than round robin in most cases. But it doesn't look obviousely worse than roun robin scheduler.

            lixi Li Xi (Inactive) added a comment - - edited Also, I realized that the TBF policy is not only throttling RPC rate but also schedule RPCs between buckets. And if we use FID or OST index as the bucket key and then sort RPC inside the bucket, we can get a policy which looks like a combination of ORR/TRR and TBF. However, inside TBF, it is not exactly round robin scheduler, it is a deadline-base scheduler. The scheduler will choose the bucket which contains the RPC that has the earlest deadline time. I needs more time to analyse whether this scheduler is better than round robin in most cases. But it doesn't look obviousely worse than roun robin scheduler.
            lixi Li Xi (Inactive) added a comment - - edited

            There would need to be a mechanism for ordering rules instead of just the order in which they are specified, so that the order in which the rules are matched can be specified.

            I think the a problem here is how to express complex yet extendable rules. RPCs need to be classified according to a logical expression to meet with different requirements. I happened to implemnt similar mechanism in another project. So I am thinking of defining rules which looks like:

            JOBID=

            {dd.500}@NID={192.168.1.1@tcp},JOBID={dd.0}@NID={192.168.1.*@tcp} 10000

            The expression of the rule is conbined by two sub-expression, JOBID={dd.500}

            @NID=

            {192.168.1.1@tcp} and JOBID={dd.0}@NID={192.168.1.*@tcp}. The expression's value is the disjunction of all the sub-expressions, which mean, the expression will be true if at least one of the sub-expressions is true.

            And sub-expression JOBID={dd.500}@NID={192.168.1.1@tcp}

            is combined by JOBID=

            {dd.500} and NID={192.168.1.1@tcp}. The value of sub-expression is the conjunction of both minimum-expressions(i.e. JOBID={dd.500}

            and NID=

            {192.168.1.1@tcp}

            ), which means the sub-expression will be trun if both of the minimum-expressions are ture.

            By using this kind of expression, we can define as complex rules as we wish. And it is not necessary to change the order of matching the rules. The rules still have a decreasing priority in the list. And an coming RPC will follow the first matched rule in the list.

            Ofcourse, we simply the way of defining the rules, because expression can only be written in the format of (a&&b&&...)||(c&&d&&...)||(e&&f&&...)||... There is no obvious benefit to enable complex expression like ((a&&b)||c)&&d, since it can be expressed by (a&&b&&d)||(c&&d). Otherwise, it would be too complex both for people to edit&read the rules and the computer to parse&match the rules.

            The time complexity of matching the rule is N, and N is the number of minimum-expressions in the rules. Hopefully, we don't define too many rules as well as too complex expressions, so the time cost should be negligible.

            Another problem is that, currently, with TBF-jobid, one bucket is allocated for RPCs belong to the same JobID. And with TBF-NID, one bucket is allocated for RPCs from the same NID. In the future, we might need to allocate one bucket for each rule instance, because otherwise, there will be too many buckets if we allocate one bucket for each kind of RPCs with different JOBID, NID and OPCODE etc.

            Also, the other question I have is how RPCs within a bucket are currently ordered? Is there any sorting within the bucket (e.g. by FID) or is it FIFO?

            Within a single bucket, the RPCs are queued in a FIFO way.

            Not sure if it is currently possible to stack TBF on top of ORR, but that would be the logical choice to simplify implementation, but it may also be acceptable to make TBF a "super policy" that does everything instead of having separate policies.

            That is a good idea. I was thinking about layered policies which means a RPC can be sorted/throttled by mutiple policies. However, with above complex rule of TBF implemented, I think ORR (or other) policy inside TBF might be enough for most (if not all) use cases. I can't think of a use case that RPCs need to be sorted for more than once. It makes sense that RPCs are first throttled for QoS purposes, and then sorted for performance improvement.

            lixi Li Xi (Inactive) added a comment - - edited There would need to be a mechanism for ordering rules instead of just the order in which they are specified, so that the order in which the rules are matched can be specified. I think the a problem here is how to express complex yet extendable rules. RPCs need to be classified according to a logical expression to meet with different requirements. I happened to implemnt similar mechanism in another project. So I am thinking of defining rules which looks like: JOBID= {dd.500}@NID={192.168.1.1@tcp},JOBID={dd.0}@NID={192.168.1.*@tcp} 10000 The expression of the rule is conbined by two sub-expression, JOBID={dd.500} @NID= {192.168.1.1@tcp} and JOBID={dd.0}@NID={192.168.1.*@tcp}. The expression's value is the disjunction of all the sub-expressions, which mean, the expression will be true if at least one of the sub-expressions is true. And sub-expression JOBID={dd.500}@NID={192.168.1.1@tcp} is combined by JOBID= {dd.500} and NID={192.168.1.1@tcp}. The value of sub-expression is the conjunction of both minimum-expressions(i.e. JOBID={dd.500} and NID= {192.168.1.1@tcp} ), which means the sub-expression will be trun if both of the minimum-expressions are ture. By using this kind of expression, we can define as complex rules as we wish. And it is not necessary to change the order of matching the rules. The rules still have a decreasing priority in the list. And an coming RPC will follow the first matched rule in the list. Ofcourse, we simply the way of defining the rules, because expression can only be written in the format of (a&&b&&...)||(c&&d&&...)||(e&&f&&...)||... There is no obvious benefit to enable complex expression like ((a&&b)||c)&&d, since it can be expressed by (a&&b&&d)||(c&&d). Otherwise, it would be too complex both for people to edit&read the rules and the computer to parse&match the rules. The time complexity of matching the rule is N, and N is the number of minimum-expressions in the rules. Hopefully, we don't define too many rules as well as too complex expressions, so the time cost should be negligible. Another problem is that, currently, with TBF-jobid, one bucket is allocated for RPCs belong to the same JobID. And with TBF-NID, one bucket is allocated for RPCs from the same NID. In the future, we might need to allocate one bucket for each rule instance, because otherwise, there will be too many buckets if we allocate one bucket for each kind of RPCs with different JOBID, NID and OPCODE etc. Also, the other question I have is how RPCs within a bucket are currently ordered? Is there any sorting within the bucket (e.g. by FID) or is it FIFO? Within a single bucket, the RPCs are queued in a FIFO way. Not sure if it is currently possible to stack TBF on top of ORR, but that would be the logical choice to simplify implementation, but it may also be acceptable to make TBF a "super policy" that does everything instead of having separate policies. That is a good idea. I was thinking about layered policies which means a RPC can be sorted/throttled by mutiple policies. However, with above complex rule of TBF implemented, I think ORR (or other) policy inside TBF might be enough for most (if not all) use cases. I can't think of a use case that RPCs need to be sorted for more than once. It makes sense that RPCs are first throttled for QoS purposes, and then sorted for performance improvement.

            While this patch is fixing a real problem and I think it should be landed, I think at the high level it is still going in the wrong direction. Only allowing the TBF policy on either NID or JobID or later on the opcode is very limiting to users. It should really be possible to have TBF policy rules on different types of identifiers at the same time so that the admin has the flexibility to set e.g. default rules on NIDs (login nodes vs. compute vs. HSM agents) but then override those rules for specific JobIDs or opcodes or users or other identifiers as they are added.

            There would need to be a mechanism for ordering rules instead of just the order in which they are specified, so that the order in which the rules are matched can be specified.

            Also, the other question I have is how RPCs within a bucket are currently ordered? Is there any sorting within the bucket (e.g. by FID) or is it FIFO? Having ORR sorting within the TBF buckets would help improve performance after the RPC priority had been determined, especially if there is a large bucket such as "compute nodes" that may have many thousands of RPCs in it at one time. Not sure if it is currently possible to stack TBF on top of ORR, but that would be the logical choice to simplify implementation, but it may also be acceptable to make TBF a "super policy" that does everything instead of having separate policies.

            adilger Andreas Dilger added a comment - While this patch is fixing a real problem and I think it should be landed, I think at the high level it is still going in the wrong direction. Only allowing the TBF policy on either NID or JobID or later on the opcode is very limiting to users. It should really be possible to have TBF policy rules on different types of identifiers at the same time so that the admin has the flexibility to set e.g. default rules on NIDs (login nodes vs. compute vs. HSM agents) but then override those rules for specific JobIDs or opcodes or users or other identifiers as they are added. There would need to be a mechanism for ordering rules instead of just the order in which they are specified, so that the order in which the rules are matched can be specified. Also, the other question I have is how RPCs within a bucket are currently ordered? Is there any sorting within the bucket (e.g. by FID) or is it FIFO? Having ORR sorting within the TBF buckets would help improve performance after the RPC priority had been determined, especially if there is a large bucket such as "compute nodes" that may have many thousands of RPCs in it at one time. Not sure if it is currently possible to stack TBF on top of ORR, but that would be the logical choice to simplify implementation, but it may also be acceptable to make TBF a "super policy" that does everything instead of having separate policies.
            pjones Peter Jones added a comment -

            ok Li Xi. We can do that.

            pjones Peter Jones added a comment - ok Li Xi. We can do that.

            I relealized this patch is still under review when adding regression tests for TBF. Can we accelerate the review process? Otherwise, users might get confused by 'weird' behaviors of changing TBF policies.

            lixi Li Xi (Inactive) added a comment - I relealized this patch is still under review when adding regression tests for TBF. Can we accelerate the review process? Otherwise, users might get confused by 'weird' behaviors of changing TBF policies.
            pjones Peter Jones added a comment -

            Niu

            Could you please review this patch?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please review this patch? Thanks Peter

            I add a patch for this problem:
            http://review.whamcloud.com/11749

            gnlwlb wu libin (Inactive) added a comment - I add a patch for this problem: http://review.whamcloud.com/11749

            People

              niu Niu Yawei (Inactive)
              gnlwlb wu libin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: