Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
NVIDIA Nsight Systems is a system wide profiler. It reads performance counters coming from CPUs, GPUs, NICs, storage volumes, etc. and brings all data to a unified timeline. This helps software developers see how their application executes over a server, to be able to optimize the application's performance.
Additional details are at:
- https://developer.nvidia.com/nsight-systems
- https://docs.nvidia.com/nsight-systems/UserGuide/index.html
When reading NFS client performance counters, Nsight Systems can read throughput counters divided into two categories:
- Application-level Read/Write - Displays quantities of data read/written to the storage device by applications (in Bytes).
- Driver-level Read/Write - Displays throughput of data read/written to the storage device by the driver (in Bytes/sec).
For example, when an application uses the “write” POSIX function to write 10 MB of data into a file, the entire 10 MB will appear, in a single sampling point, at the Application-level Write counter. The same 10 MB of data may be spread across multiple Driver-level Write counter sampling points, since it may take a bit of time for the NFS driver to write 10 MB of data into the NFS storage server
The Nsight Systems team would like to have similar Lustre client counters.
The counters exposed at /sys/kernel/debug/lustre/<volume name>/stats seem to be Application-level counters.
Having, in addition, Driver-level counters can help figure out latencies caused when using Lustre volumes.