Details
-
Improvement
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
CPT Mapping of Client RPCs and NID/Nodemap-Based TBF Configuration
Mapping between client RPCs (src NID) and server‑side CPT
When LNet distributes incoming RPCs to CPTs / `ptlrpc_service_part` instances, it does notsimply round‑robin them evenly. Instead, it follows rules that usually ensure that all requests from the same client (NID) are mapped to the same CPT, in order to reduce lock contention and cache thrashing.
The mapping from source NID to CPT is done entirely in the LNet layer.
The PtlRPC layer simply uses this CPT to select the corresponding `ptlrpc_service_part` and its thread pool.
On the server, after CPT is introduced, the high‑level flow is:
1. LNet uses the local CPT topology and NI (network interface) configuration to create CPT‑specific portal/service resources for each CPT (buffer pools, wait queues, etc.).
2. Each PtlRPC service (for example the `ost` and `ost_io` services on an OST) is split into multiple `ptlrpc_service_part` instances, each bound to one CPT (its `srv_cptid`).
3. When LNet receives an RPC:
- It selects a CPT id for this packet based on NIC/CPT mappings, a hash of the source NID, and/or local CPU affinity rules.
- It then enqueues the request into the queue of the `ptlrpc_service_part` for that CPT, to be processed by that CPT’s service threads.
It is not simply plain round‑robin across CPTs when assign a RPC to a CPT, for two main reasons:
1. Locality / lock contention
- If requests from the same client were randomly scattered across CPTs, this would cause:
- Frequent lock contention for shared state (exports, objects, etc.) across multiple CPTs;
- Cache lines constantly bouncing between NUMA nodes, which is expensive.
- Therefore, the common design is to maintain a stable CPT mapping for each peer (NID/export) so that its requests consistently land on the same CPT.
2. Overall load balancing
- With many clients, their NIDs are hashed across multiple CPTs, so the load is approximately balanced across CPTs at the node level.
- However, a single client is not evenly spread; it is typically bound to one (or a small number of) CPTs.
Setting NRS TBF for NID‑based rules
The current problem with NRS + CPT is that the rates specified for, e.g., TBF rules are per‑CPT, which is not what users typically expect and can lead to uneven behavior if clients are not using each CPT uniformly. There are two possible changes that should be made to make this behavior closer to user expectations:
- The rate/tokens specified for a rule should represent the global rate across all CPTs on a server node, and should be divided by the number of CPTs when assigned to a per‑CPT class bucket. This ensures that the total rate in one interval matches the configured value and does not depend on the server’s NUMA/CPT configuration.
- There should be balancing among TBF buckets across CPTs within an interval so that all tokens can be consumed if needed. This balancing does not have to be perfect, but it can and should be much better than having no balancing at all. For example, if the TBF bucket on one CPT runs out of tokens, it could look at buckets on other CPTs and, upon finding one with remaining tokens, steal half of the remaining tokens.
However, for NID‑based TBF rules (such as rules keyed by NID or Nodemap), because all requests from a given source NID are consistently mapped to a single same CPT, it is sufficient to configure the TBF rule only on the TBF scheduler of that CPT. There is no need to configure the same rule on every CPT.
Under Nodemap configuration (or a TBF rules including multiple NIDs), a nodemap typically contains a set of client NIDs. If the NIDs in that nodemap are configured at the LNet layer so that they all map to the same CPT, then all requests belonging to that nodemap will fall into that single CPT. In that case, an administrator only needs to configure the TBF scheduler on the PtlRPC service for that CPT to achieve the desired rate limiting for the entire nodemap, no need to configure the related class buckets on all CPT services.
Attachments
Issue Links
- is related to
-
LU-17158 TBF rate should not be based on CPT
-
- Open
-