Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18709

High frequency performance counters accessible to non-root users

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      NVIDIA Nsight Systems is a system wide profiler. It reads performance counters coming from CPUs, GPUs, NICs, storage volumes, etc. and brings all data to a unified timeline. This helps software developers see how their application executes over a server, to be able to optimize the application's performance.

      Additional details are at:

      Counters are read at high frequency (typically at 10 kHz) to be able to correlate performance counters to application actions and source code.

      As of Feb 2025, Nsight Systems collects performance counters from Lustre and NFS volumes, NVMe disks and NVMe-oF. Additional storage protocols support is being added.

      When reading Lustre performance counters, Nsight Systems users experience two problems:

      1. Lustre counters are exposed under /sys/kernel/debug/lustre. Accessing this location requires root access, which cluster users don't usually have.
      2. Lustre counters can be read reliably at 1 kHz. Nsight Systems would like to read counters at 10 kHz.

       

      Attachments

        Issue Links

          Activity

            People

              pjones Peter Jones
              ytebeka Yaki Tebeka
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: