Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8433

Maximizing Bandwidth utilization by TBF Rule with Dependency

Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      The TBF policy is not aimed to provide improved performance, it achieves rate limit through I/O throttle. When I/O is throttling, it could not make full use of system resources even there are some ideal I/O service threads and spare disk I/O bandwidth. But for some use cases, it needs to have the ability to allocate spare capacity to the workloads or background jobs. In order to ensure efficient utilization of I/O resource, we propose a dependency rule strategy. The command of a dependency rule is shown as following:

      start ruleB <matchCondition> deprule=ruleA lowerrate=$r1 upperrate=$r2
      

      Where deprule represents the rule name of the dependent rule, which means, 'ruleB' depends on 'ruleA' ; the key 'lowerrate' indicates the lower bound of RPC rate limited value while the key 'upperrate' indicates the upper bound of RPC rate limited value. The principle is that the real RPC rate limited value of a rule is dynamically adjusted between the lowerrate and upperrate to obtain more I/O bandwidth according to the spare I/O capacity that its dependent rule does not make full use of.

      Attachments

        Issue Links

          Activity

            [LU-8433] Maximizing Bandwidth utilization by TBF Rule with Dependency
            qian_wc Qian Yingjin added a comment -

            They are two different features.

            The dependent rule (A) can use the spare bandwidth when the bandwidth of the rule (B) it depends is underutilized.

             

            The Priority class in LU-9228 is a feature to meet the bandwidth requirement of the rule with realtime as much as possible.

            qian_wc Qian Yingjin added a comment - They are two different features. The dependent rule (A) can use the spare bandwidth when the bandwidth of the rule (B) it depends is underutilized.   The Priority class in LU-9228 is a feature to meet the bandwidth requirement of the rule with realtime as much as possible.

            Isn't this addressed by the priority class in LU-9228?

            nrutman Nathan Rutman added a comment - Isn't this addressed by the priority class in LU-9228 ?

            To my reading, the specification of dependent rules will be complex and not easily handled by users. Instead, it seems like TBF could have "soft" scheduling of RPCs using the existing rules, and just not enforce RPC limits on classes if there are no RPCs of some higher priority in the queue. In essence (I think) a class with outstanding RPCs would continue to gain tokens (in proportion with the rate of lower-priority rules) while it has queued RPCs and higher priorities do not. Such rules could add a new keyword to specify rate_soft= or similar (since the current rate= is considered a hard limit today).

            adilger Andreas Dilger added a comment - To my reading, the specification of dependent rules will be complex and not easily handled by users. Instead, it seems like TBF could have "soft" scheduling of RPCs using the existing rules, and just not enforce RPC limits on classes if there are no RPCs of some higher priority in the queue. In essence (I think) a class with outstanding RPCs would continue to gain tokens (in proportion with the rate of lower-priority rules) while it has queued RPCs and higher priorities do not. Such rules could add a new keyword to specify rate_soft= or similar (since the current rate= is considered a hard limit today).

            Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/24515
            Subject: LU-8433 nrs: Maximizing throughput via rules with dependency
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 2de06fce6b484239a9dc9491557bd76177386bab

            gerrit Gerrit Updater added a comment - Yingjin Qian (qian@ddn.com) uploaded a new patch: https://review.whamcloud.com/24515 Subject: LU-8433 nrs: Maximizing throughput via rules with dependency Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2de06fce6b484239a9dc9491557bd76177386bab

            Our current TBF policy is an actual limit on the number of RPCs.
            There is a paper named "mClock: Handling Throughput Variability for Hypervisor IO Scheduling" (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.182.4720) , I think it can solve the question you concerned above. According to the paper, it has following features: Proportional allocation, Latency support, Reservation support, Limit support and Handle capacity fluctuation, etc.
            mClock assigns tags spaced by increments of 1/rate to succes-sive requests of a VM. If all requests are scheduled in order of their tag values, the VM will receive service in proportion to rate. And mClock extend the notion to use multiple tags to support proportional-share fairness subject to minimum reservations and maximum limits on the IO allocations for VMs. Our algorithm also uses the notion similar with tag-based scheduling to achieve rate control, but we set the deadline (tags) based on the class not based on each request, the sort set according to the deadline is much less, and all classes are selected in order of their deadline and then schedule the requests in the class queue in FCFS order. As the general intuitive ideas are similar, we think it is possible to integrated mClock algorithm into our algorithm, but we still need to investigate.

            qian Qian Yingjin (Inactive) added a comment - Our current TBF policy is an actual limit on the number of RPCs. There is a paper named "mClock: Handling Throughput Variability for Hypervisor IO Scheduling" ( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.182.4720 ) , I think it can solve the question you concerned above. According to the paper, it has following features: Proportional allocation, Latency support, Reservation support, Limit support and Handle capacity fluctuation, etc. mClock assigns tags spaced by increments of 1/rate to succes-sive requests of a VM. If all requests are scheduled in order of their tag values, the VM will receive service in proportion to rate. And mClock extend the notion to use multiple tags to support proportional-share fairness subject to minimum reservations and maximum limits on the IO allocations for VMs. Our algorithm also uses the notion similar with tag-based scheduling to achieve rate control, but we set the deadline (tags) based on the class not based on each request, the sort set according to the deadline is much less, and all classes are selected in order of their deadline and then schedule the requests in the class queue in FCFS order. As the general intuitive ideas are similar, we think it is possible to integrated mClock algorithm into our algorithm, but we still need to investigate.

            Hmm, I think the tittle of "Maximizing Bandwidth" is a little bit inaccurate. It is not necessarily true that the dependency rules will maximize the bandwidth comparing to the original TBF policy. At least the puporse of implementing of dependency rules is not maximizing the bandwidth. Instead, dependency rules enable different priority levels for different NIDs/JobIDs. Assume that we have a job 0 with high priority and another job 1 with low priority. We could set a rule A to match the job 0, and rule B which matches job 1 and depends on rule A. Ideally, the modified TBF policy would always provide as much RPC rate as possible to job 0. If job 0 is not using up the whole bandwidth on the OSS, job 1 could increase its rate limitation. And if job 0 is not getting the expected RPC rate of rule A, the RPC rate of job 1 should be decreased. So, as you can see, this is advanced QoS, not maximizing bandwidth.

            Yeah, I would expect that combing ORR with TBF might be able to improve the throughput in total.

            lixi Li Xi (Inactive) added a comment - Hmm, I think the tittle of "Maximizing Bandwidth" is a little bit inaccurate. It is not necessarily true that the dependency rules will maximize the bandwidth comparing to the original TBF policy. At least the puporse of implementing of dependency rules is not maximizing the bandwidth. Instead, dependency rules enable different priority levels for different NIDs/JobIDs. Assume that we have a job 0 with high priority and another job 1 with low priority. We could set a rule A to match the job 0, and rule B which matches job 1 and depends on rule A. Ideally, the modified TBF policy would always provide as much RPC rate as possible to job 0. If job 0 is not using up the whole bandwidth on the OSS, job 1 could increase its rate limitation. And if job 0 is not getting the expected RPC rate of rule A, the RPC rate of job 1 should be decreased. So, as you can see, this is advanced QoS, not maximizing bandwidth. Yeah, I would expect that combing ORR with TBF might be able to improve the throughput in total.

            It isn't clear to me why the current TBF implementation will restrict performance if there are idle resources? Is the current TBF priority a relative weighting (i.e. process RPCs matching ruleA N times more often than RPCs matching ruleB) or is it an actual limit on the number of RPCs (i.e. process only N RPCs matching ruleA per second)? If it is a relative weighting then the threads should always be kept busy, either because RPCs matching ruleA are available, or because none are available and RPCs matching ruleB are available.

            What would also be useful is if TBF included the functionality from ORR to do request ordering within a given class to optimize IO submission to disk. That could allow TBF to improve performance instead of just reducing it. The alternative is to allow stacking NRS policies so that ORR is the secondary ordering for TBF, but I don't know if that will provide as good performance as having a single combined policy.

            adilger Andreas Dilger added a comment - It isn't clear to me why the current TBF implementation will restrict performance if there are idle resources? Is the current TBF priority a relative weighting (i.e. process RPCs matching ruleA N times more often than RPCs matching ruleB) or is it an actual limit on the number of RPCs (i.e. process only N RPCs matching ruleA per second)? If it is a relative weighting then the threads should always be kept busy, either because RPCs matching ruleA are available, or because none are available and RPCs matching ruleB are available. What would also be useful is if TBF included the functionality from ORR to do request ordering within a given class to optimize IO submission to disk. That could allow TBF to improve performance instead of just reducing it. The alternative is to allow stacking NRS policies so that ORR is the secondary ordering for TBF, but I don't know if that will provide as good performance as having a single combined policy.

            We think it is more of a new feature than improvement. Change to new feature.

            cheneva1 Evan D. Chen (Inactive) added a comment - We think it is more of a new feature than improvement. Change to new feature.

            Hi Peter,

            Yeah. Yingjin and me are working on this for a long time. And as far as we've tested, the patches work well except
            a few corner conditions. As soon as we finished fixing the defects, we will push the patches to community. And
            code review would be much appreciated

            Regards,
            Li Xi

            lixi Li Xi (Inactive) added a comment - Hi Peter, Yeah. Yingjin and me are working on this for a long time. And as far as we've tested, the patches work well except a few corner conditions. As soon as we finished fixing the defects, we will push the patches to community. And code review would be much appreciated Regards, Li Xi
            pjones Peter Jones added a comment -

            Is this something that you are working on yourself?

            pjones Peter Jones added a comment - Is this something that you are working on yourself?

            People

              qian_wc Qian Yingjin
              qian Qian Yingjin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated: