Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
It would be useful to add a new "{osc,mdc}.*.max_mb_per_rpc_read/write" tunable parameters to dissociate the RPC size from the client page size as implied by "{osc,mdc}.*.max_pages_per_rpc". While it is already to specify "max_pages_per_rpc" with a unit like "4M" or "16M" to avoid issues with page sizes, this is still printed as a number of pages, and values without units are assumed to be pages. It would be better to have a dedicated parameter for specifying the RPC size, and (very) slowly deprecate max_pages_per_rpc in the future.
With NVMe storage, the read and write perfomance behavior is different, so having different RPC sizes for each will help optimize the performance for different IO workloads.
The cl_max_pages_per_rpc variable should be first split into cl_max_pages_per_rpc_read and cl_max_pages_per_rpc_write and all uses should be distinguished between read and write RPC generation.
The max_mb_per_rpc_read/write parameters should internally convert the specified parameter to pages and still modify the respective cl_max_pages_per_rpc_read/write variable. The existing max_pages_per_rpc parameter would set both parameters identically.
If reading from max_pages_per_rpc and the max_mb_per_rpc_read/write parameters are different, then the smaller of the two values should be printed. I suspect this will be uncommon (except for debug tools like sosreport), and printing an error or warning each time this happens would be needless noise. The separate max_mb_per_rpc_read/write parameters will make it clear what the actual values are.
The parameter should accept units and fractional parameters (e.g. 0.5M and 64k), which is already handled by sysfs_memparse(), so that it is possible to specify RPC sizes smaller than 1MiB in the rare case this is needed.