Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
NVIDIA Nsight Systems is a system wide profiler. It reads performance counters coming from CPUs, GPUs, NICs, storage volumes, etc. and brings all data to a unified timeline. This helps software developers see how their application executes over a server, to be able to optimize the application's performance.
Additional details are at:
- https://developer.nvidia.com/nsight-systems
- https://docs.nvidia.com/nsight-systems/UserGuide/index.html
When reading client NFS volumes counters, Nsight Systems gets latency related counters:
- Queue Time - the average duration in milliseconds from the point the NFS client creates an RPC request until the request is transmitted
- Round-trip Time - the average duration in milliseconds from the point the NFS client's kernel sends an RPC request until it receives the response.
- Execution Time - the average duration in milliseconds from the point the NFSclient submits an RPC request to the kernel until the request is completed
These counters help software developers and cluster owners figure out latencies caused by using storage volumes.
The Nsight Systems team would like to be able to read similar latency counters for Lustre volumes.