> While it may be necessary to account for the actual read/write bytes after the RPC transfer is complete, the code should account for the IO latency after the IO is complete, as it did before, rather than after the RPC is complete. The RPC stats at the OST level and on the client will include the full RPC latency, and the ofd stats should only account for the storage latency.
I don't know how much work this is, or if this is the best way to do things, but I think it might be useful to have both sets of counters (including for metadata operations):
- I/O completion
- I/O completion + after RPC completion (<countername>_rtt, or maybe an extra field that needs to be enabled via a tunable?)
You can certainly collect the per client counters (via llite), but it's a lot more difficult to collect/munge/etc all of the client data than having an overall average server side for general use, such as identifying network congestion outside of the Lustre servers' control.
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46833/
Subject:
LU-15642obdclass: use consistent stats unitsProject: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b515c6ec2ab84598c77c65eb78f1afd5e67b1ede