[LU-398] NRS (Network Request Scheduler ) - Whamcloud Community JIRA

Details

Type: New Feature
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.4.0
Affects Version/s: Lustre 2.4.0
Labels:
None

Bugzilla ID:
13,634
Rank (Obsolete):
7604

Description

Architecture - Network Request Scheduler

(NB: this is copy of http://wiki.lustre.org/index.php/Architecture_-_Network_Request_Scheduler)

Definitions

NRS
Network Request Scheduler.

RPC Concurrency
The number of RPC requests in-flight RPC between a given client and a server.

Active fan-out
The number of clients with in-flight requests to a given server at a given time.
Offset stream
The sequence of file or disk offsets in a stream of I/O requests.

"Before" relation (≤)
File system operations that require ordering for correctness are related by "≤". For 2 operations a and b, if a ≤ b, then operation a must complete reading/writing file system state before operation b can start.

POP
Partial Order Preservation. A filesystem's POP capability describes how its servers handle any "before" relations required on RPC sent to them. Servers with no POP capability have no concept of any "before" relation on incoming RPCs so clients are completely responsible for preserving it. Servers with local POP capability preserve the "before" relation within a single server, but clients are responsible for preserving any required order on RPCs sent to different servers. A set of servers with global POP capability preserves the "before" relation on all RPCs.
Summary
The Network Request Scheduler manages incoming RPC requests on a server to provide improved and consistent performance. It does this primarily by ordering request execution to avoid client starvation and to present a workload to the backend filesystem that can be optimized more easily. It may also change RPC concurrency as active fan-out varies to reduce latency seen by the client and limit request buffering on the server.
Requirements

POP Capability
The NRS must implement any POP capability its clients require.

Current Lustre servers have no POP capability therefore clients may never issue RPCs concurrently that have a "before" relation - viz. metadata RPCs are synchronous and dirty data must have been written back before locks can start to be released. This leaves the NRS free to reorder all incoming RPCs.

Any POP capability should permit better RPC pipelining for improved throughput to single clients and better latency hiding when resolving lock conflicts.

The implementation may choose to implement a very simple POP capability that only works for the most important use cases, since it can revert to synchronous client behaviour in complex cases.

An implementation may create additional "before" relations between RPCs provided they do not conflict with any "real" ordering (i.e. no cycles in the global "before" graph). This may allow a more compact "wire" representation of the "before" relation and/or just a simpler overall implementation, at the expense of reducing the scope to optimize request order.

Consider RPC requests a ≤ b. Implementations that could allow request b to reach a server before request a will have to log completed requests for the duration of a server epoch.

A global POP capability seems to require too much and too fine-grained inter-server communication which will make it hard to implement efficiently. It should probably not be considered unless a significant use-case arises.

Scalability
The number of RPC requests the server may buffer at any time is the product of RPC concurrency and active fan-out - i.e. potentially many thousands of requests. Request scheduling operations should have complexity of O(log) at most.

Offset Stream Consistency
The backend filesystem allocator determines the disk offset stream when a given file is first written. It may even turn a random file offset stream into a substantially sequential disk offset stream. The disk offset stream is repeated when the file is read, provided the file offset stream hasn't changed. Request ordering should therefore be as reproducible as possible in the face of ordering "noise" caused by network unfairness or client races.

Clients should pass a "hint" in RPC requests to ensure related offset streams can be identified, reordered and merged consistently on a multi-user cluster. This "hint" should also be passed through to the backend file system and used by its allocator. The "hint" may also become the basis of a resource reservation system to guarantee share of server resource to concurrent jobs.

Request Priority
Request priorities enable important requests to be serviced with lower latency - e.g. writes required to clean a cache on a locking conflict. Note that high priority requests must not break any POP requirements.

RPC Concurrency
There are conflicting pressures on RPC concurrency. It should be high when maximum individual client performance is required - e.g. when active fan-out is low on the server and there is spare server bandwidth, or when a client must clean its cache on a lock conflict. It should be low at times of high active fan-out to reduce buffering required on the server and to limit the latency of individual client requests.

Extendability
The NRS must inter-operate with non-NRS-aware clients and peers, making "best efforts" scheduling descisions for them. This same policy must apply to successive client and server versions.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

0001-Fix-for-minor-typo-in-nrs_crr_res_get.patch
1.0 kB
12/Aug/11 8:28 AM
0002-Add-assertions-where-svc-srv_rq_lock-is-meant-to-be-.patch
4 kB
12/Aug/11 8:28 AM
0003-Rework-movement-of-policies-in-nrs_policy_queued-lis.patch
2 kB
12/Aug/11 8:28 AM
0004-Make-some-functions-static.patch
4 kB
12/Aug/11 8:28 AM
acc_sm_summary
19 kB
01/Mar/12 5:57 AM
acc_sm_summary_2
0.8 kB
01/Mar/12 5:57 AM
HLD_of_Lustre_NRS.pdf
270 kB
05/Jul/11 2:39 PM
LUG_2012_NRS_tests_output.tar.bz2
32 kB
09/May/12 11:35 AM
NRS_Bandwidth_Policies_LAD_2012.pdf
170 kB
05/Dec/12 4:34 PM
nrs_dld_insp_6.patch
73 kB
05/Jul/11 2:37 PM
nrs_generic_framework_dld_7.patch
74 kB
14/Jul/11 6:23 AM
NRS_LAD_12.pdf
583 kB
05/Dec/12 4:34 PM
nrs_liang_v2.patch
54 kB
08/Jul/11 12:37 PM
nrs_orr_log_phys_offs_v1.patch
47 kB
04/Apr/12 9:03 AM
nrs_orr_trr_v2.patch
57 kB
13/Apr/12 11:39 AM
NRS_Scale_Testing_Results_LUG_2012_Nikitas_Angelinas_Xyratex.pdf
1.41 MB
09/May/12 11:35 AM
nrs.patch
54 kB
08/Jul/11 5:46 AM
NRS Conceptual Design__v1.0.doc
101 kB
28/Jan/13 12:35 PM
NRS Conceptual Design__v1.0.pdf
139 kB
28/Jan/13 12:35 PM
NRS Test Plan for Lustre 2.4__v1.2.doc
110 kB
29/Jan/13 7:41 AM
NRS Test Plan for Lustre 2.4__v1.2.pdf
182 kB
29/Jan/13 7:41 AM
orr_log_offs_v1.patch
46 kB
12/Mar/12 7:52 AM

Issue Links

is blocked by

LU-3239 ofd_internal.h:518:ofd_info_init()) ASSERTION( info )

Resolved

is related to

LU-2936 nrs_svcpt2nrs()) ASSERTION( (!(hp) || (nrs_svcpt_has_hp(svcpt))) ) failed

Resolved

LU-2981 sanity.sh test_17m test_77i: oops in ptlrpc_server_hpreq_fini

Resolved

LU-3238 ASSERTION( (!(moving_req ? CFS_ALLOC_ATOMIC : CFS_ALLOC_IO != CFS_ALLOC_ATOMIC)

Resolved

LU-4493 NRS ORR crash

Resolved

LU-3265 nrs_crrn_quantum proc files unreadable, sanity test_133f fails to detect this

Closed

LU-6283 NRS Delay Policy

Resolved

LU-6336 Refactor ptlrpc_nrs_request structure to remove union of policy-specific request structures

Open

LUDOC-79 NRS Doc Changes

Closed

LU-765 RPC rate control

Closed

is related to

LU-2947 OBD_FAIL_PTLRPC_HPREQ_* implementation broken after NRS landing

Resolved

Trackbacks

Network Request Scheduler Network Request Scheduler current as of June 22, 2011 Liang has been assigned to work on this. Liang will prototype a solution and test it to prove or disprove that the idea actually works. If NRS has technical merit,...

Lustre Community Development in Progress Features are being developed for future Lustre releases both at Whamcloud and by other organizations in the Lustre community. These will be eligible for inclusion in future Lustre releases as per our processes

(5 is related to, 1 is related to , 2 Trackbacks)

Sub-Tasks

Progress

1.	Write and attach test plan to Jira ticket for NRS	Resolved	Liang Zhen (Inactive)
2.	move NRS structures/definitions from lustre_net.h to new lustre_nrs.h header	Resolved	WC Triage
3.	Regression tests for NRS policies	Resolved	WC Triage

NRS (Network Request Scheduler )

Details

Description

Architecture - Network Request Scheduler

Definitions

Summary

Requirements

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates