Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.6.0
    • None
    • 8963

    Description

      NRS (Network Request Scheduler) enables the services to schedule the RPCs in different manners. And there have been a bunch of policies implemented over the main framework. Most of them are aimed at improving throughput rate or similar purposes. But we are trying to implement policies for a differnt kind of purpose, QoS.

      The TBF (Token Bucket Filter) is one of the policies that we implemented for traffic control. It enforces a RPC rate limit on every client according to the NID. The handling of a RPC will be delayed until there are enough tokens for the client. Different clients are scheduled according to their deadlines, so that none of them will be starving even though the service does not have the ability to satisfy all the RPC rate requirments of clients. The RPCs from the the same clients are queued in a FIFO manner.)

      Early tests show that the policy works to enforce the RPC rate limit. But more tests, bechmarks and analyses is needed for its correctness and efficiency.

      Attachments

        Issue Links

          Activity

            [LU-3558] NRS TBF policy for QoS purposes

            Patch has landed for 2.6.0, using LUDOC-221 for tracking the remaining work on the manual.

            adilger Andreas Dilger added a comment - Patch has landed for 2.6.0, using LUDOC-221 for tracking the remaining work on the manual.

            Can this ticket now be closed since we have LUDOC-221 to track the manual updates?

            jlevi Jodi Levi (Inactive) added a comment - Can this ticket now be closed since we have LUDOC-221 to track the manual updates?

            I filed LUDOC-221 to track the documentation update for the TBF feature.

            adilger Andreas Dilger added a comment - I filed LUDOC-221 to track the documentation update for the TBF feature.

            No problem! We will submit a manual update soon. Thanks!

            lixi Li Xi (Inactive) added a comment - No problem! We will submit a manual update soon. Thanks!
            adilger Andreas Dilger added a comment - - edited

            The patch http://review.whamcloud.com/6901 was landed to master for 2.6.

            This functionality also needs an update to the manual to explain what this feature does, and how to use it. Please see https://wiki.hpdd.intel.com/display/PUB/Making+changes+to+the+Lustre+Manual. Please submit an LUDOC jira ticket to track the manual update, and link it here.

            adilger Andreas Dilger added a comment - - edited The patch http://review.whamcloud.com/6901 was landed to master for 2.6. This functionality also needs an update to the manual to explain what this feature does, and how to use it. Please see https://wiki.hpdd.intel.com/display/PUB/Making+changes+to+the+Lustre+Manual . Please submit an LUDOC jira ticket to track the manual update, and link it here.

            OK, thanks! We hope popole can get QoS function with Lustre sooner and we want it as well!

            ihara Shuichi Ihara (Inactive) added a comment - OK, thanks! We hope popole can get QoS function with Lustre sooner and we want it as well!

            No disagreement from me But I'm not a technical person - I just like the capability that TBF provides.

            So we'll have to get technical people to review this.

            Thanks!

            laytonjb Jeff Layton (Inactive) added a comment - No disagreement from me But I'm not a technical person - I just like the capability that TBF provides. So we'll have to get technical people to review this. Thanks!

            Hmm.. the question from us, why not included in 2.6 or even 2.5.1 yet? The original discussion with Peter, this is not core component of Lustre, it could be landed in 2.5. or 2.4.x even. But, review didn't finish before 2.5 release.
            After that, we got at least multiple inspection pass from multiple people, but rebase was needed again and again, then it needed review again.

            I would request review this quickly again and we want to land this in 2.6 and 2.5.1...

            ihara Shuichi Ihara (Inactive) added a comment - Hmm.. the question from us, why not included in 2.6 or even 2.5.1 yet? The original discussion with Peter, this is not core component of Lustre, it could be landed in 2.5. or 2.4.x even. But, review didn't finish before 2.5 release. After that, we got at least multiple inspection pass from multiple people, but rebase was needed again and again, then it needed review again. I would request review this quickly again and we want to land this in 2.6 and 2.5.1...

            It's been a few months since the last entry. I wanted to ask if this idea/patch is worthy of further work for inclusion in 2.7? Thanks!

            laytonjb Jeff Layton (Inactive) added a comment - It's been a few months since the last entry. I wanted to ask if this idea/patch is worthy of further work for inclusion in 2.7? Thanks!

            I believe your analysis of case #6 is correct - the client only has a limited number of RPCs in flight for each RPC (see also the LNET "peer credits" tunable). If the u500 IOs are blocked behind the u0 IOs, they will be limited by the slower process. This may not be strictly related to the RPCs, but rather to the higher-level RPC engine that is trying to balance IO submission between objects, and doesn't know about the NRS ordering on the server.

            The first question is whether this is a use case that is important for real users? I'm not sure if there is an easy solution for how to handle this from the server.

            adilger Andreas Dilger added a comment - I believe your analysis of case #6 is correct - the client only has a limited number of RPCs in flight for each RPC (see also the LNET "peer credits" tunable). If the u500 IOs are blocked behind the u0 IOs, they will be limited by the slower process. This may not be strictly related to the RPCs, but rather to the higher-level RPC engine that is trying to balance IO submission between objects, and doesn't know about the NRS ordering on the server. The first question is whether this is a use case that is important for real users? I'm not sure if there is an easy solution for how to handle this from the server.

            People

              laisiyao Lai Siyao
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: