Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-398

NRS (Network Request Scheduler )

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.4.0
    • Fix Version/s: Lustre 2.4.0
    • Labels:
      None
    • Bugzilla ID:
      13,634
    • Rank (Obsolete):
      7604

      Description

      Architecture - Network Request Scheduler

      (NB: this is copy of http://wiki.lustre.org/index.php/Architecture_-_Network_Request_Scheduler)

      Definitions

      • NRS
        Network Request Scheduler.
      • RPC Concurrency
        The number of RPC requests in-flight RPC between a given client and a server.
      • Active fan-out
        The number of clients with in-flight requests to a given server at a given time.
      • Offset stream
        The sequence of file or disk offsets in a stream of I/O requests.
      • "Before" relation (≤)
        File system operations that require ordering for correctness are related by "≤". For 2 operations a and b, if a ≤ b, then operation a must complete reading/writing file system state before operation b can start.
      • POP
        Partial Order Preservation. A filesystem's POP capability describes how its servers handle any "before" relations required on RPC sent to them. Servers with no POP capability have no concept of any "before" relation on incoming RPCs so clients are completely responsible for preserving it. Servers with local POP capability preserve the "before" relation within a single server, but clients are responsible for preserving any required order on RPCs sent to different servers. A set of servers with global POP capability preserves the "before" relation on all RPCs.

        Summary

      • The Network Request Scheduler manages incoming RPC requests on a server to provide improved and consistent performance. It does this primarily by ordering request execution to avoid client starvation and to present a workload to the backend filesystem that can be optimized more easily. It may also change RPC concurrency as active fan-out varies to reduce latency seen by the client and limit request buffering on the server.

        Requirements

      • POP Capability
        The NRS must implement any POP capability its clients require.

        Current Lustre servers have no POP capability therefore clients may never issue RPCs concurrently that have a "before" relation - viz. metadata RPCs are synchronous and dirty data must have been written back before locks can start to be released. This leaves the NRS free to reorder all incoming RPCs.

        Any POP capability should permit better RPC pipelining for improved throughput to single clients and better latency hiding when resolving lock conflicts.

        The implementation may choose to implement a very simple POP capability that only works for the most important use cases, since it can revert to synchronous client behaviour in complex cases.

        An implementation may create additional "before" relations between RPCs provided they do not conflict with any "real" ordering (i.e. no cycles in the global "before" graph). This may allow a more compact "wire" representation of the "before" relation and/or just a simpler overall implementation, at the expense of reducing the scope to optimize request order.

        Consider RPC requests a ≤ b. Implementations that could allow request b to reach a server before request a will have to log completed requests for the duration of a server epoch.

        A global POP capability seems to require too much and too fine-grained inter-server communication which will make it hard to implement efficiently. It should probably not be considered unless a significant use-case arises.
      • Scalability
        The number of RPC requests the server may buffer at any time is the product of RPC concurrency and active fan-out - i.e. potentially many thousands of requests. Request scheduling operations should have complexity of O(log) at most.
      • Offset Stream Consistency
        The backend filesystem allocator determines the disk offset stream when a given file is first written. It may even turn a random file offset stream into a substantially sequential disk offset stream. The disk offset stream is repeated when the file is read, provided the file offset stream hasn't changed. Request ordering should therefore be as reproducible as possible in the face of ordering "noise" caused by network unfairness or client races.

        Clients should pass a "hint" in RPC requests to ensure related offset streams can be identified, reordered and merged consistently on a multi-user cluster. This "hint" should also be passed through to the backend file system and used by its allocator. The "hint" may also become the basis of a resource reservation system to guarantee share of server resource to concurrent jobs.
      • Request Priority
        Request priorities enable important requests to be serviced with lower latency - e.g. writes required to clean a cache on a locking conflict. Note that high priority requests must not break any POP requirements.
      • RPC Concurrency
        There are conflicting pressures on RPC concurrency. It should be high when maximum individual client performance is required - e.g. when active fan-out is low on the server and there is spare server bandwidth, or when a client must clean its cache on a lock conflict. It should be low at times of high active fan-out to reduce buffering required on the server and to limit the latency of individual client requests.
      • Extendability
        The NRS must inter-operate with non-NRS-aware clients and peers, making "best efforts" scheduling descisions for them. This same policy must apply to successive client and server versions.

        Attachments

        1. 0001-Fix-for-minor-typo-in-nrs_crr_res_get.patch
          1.0 kB
        2. 0002-Add-assertions-where-svc-srv_rq_lock-is-meant-to-be-.patch
          4 kB
        3. 0003-Rework-movement-of-policies-in-nrs_policy_queued-lis.patch
          2 kB
        4. 0004-Make-some-functions-static.patch
          4 kB
        5. acc_sm_summary
          19 kB
        6. acc_sm_summary_2
          0.8 kB
        7. HLD_of_Lustre_NRS.pdf
          270 kB
        8. LUG_2012_NRS_tests_output.tar.bz2
          32 kB
        9. NRS_Bandwidth_Policies_LAD_2012.pdf
          170 kB
        10. nrs_dld_insp_6.patch
          73 kB
        11. nrs_generic_framework_dld_7.patch
          74 kB
        12. NRS_LAD_12.pdf
          583 kB
        13. nrs_liang_v2.patch
          54 kB
        14. nrs_orr_log_phys_offs_v1.patch
          47 kB
        15. nrs_orr_trr_v2.patch
          57 kB
        16. NRS_Scale_Testing_Results_LUG_2012_Nikitas_Angelinas_Xyratex.pdf
          1.41 MB
        17. nrs.patch
          54 kB
        18. NRS Conceptual Design__v1.0.doc
          101 kB
        19. NRS Conceptual Design__v1.0.pdf
          139 kB
        20. NRS Test Plan for Lustre 2.4__v1.2.doc
          110 kB
        21. NRS Test Plan for Lustre 2.4__v1.2.pdf
          182 kB
        22. orr_log_offs_v1.patch
          46 kB

          Issue Links

            Activity

              People

              • Assignee:
                liang Liang Zhen (Inactive)
                Reporter:
                liang Liang Zhen (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                20 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: