Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20038

'lctl find_param' opens and reads every parameter

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Medium
    • Lustre 2.18.0
    • Lustre 2.17.0, Lustre 2.18.0
    • 3
    • 9223372036854775807

    Description

      The new lctl find_param parameter added in patch https://review.whamcloud.com/57675 ("LU-18418 utils: new lctl find_param option") appears to be opening and reading every parameter, even if they do not match the parameter name being requested. The find_param sub-command is only looking at parameter names and not any values, as can be seen from using read_bytes as an argument, which is definitely present in many stats parameters:

      # lctl find_param read_bytes
      #
      

      Searching for the memused parameter returns 3 entries, all of which can be accessed directly from memory, but this can take a long time to complete:

      # lctl find_param memused
      memused=508001847
      memused_max=513329167
      lnet_memused=99452430
      

      Running under strace shows that the code is actually opening and reading the values from every parameter in the filesystem, including some that trigger disk reads and are very large, for no reason at all:

      # strace lctl find_param memused |& less
      :
      lstat("/sys/kernel/debug/lustre/mgs/MGS/osd/brw_stats", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
      openat(AT_FDCWD, "/sys/kernel/debug/lustre/mgs/MGS/osd/brw_stats", O_RDONLY) = 3
      read(3, "snapshot_time:            177438"..., 4096) = 1050
      read(3, "", 4096)                       = 0
      close(3)                                = 0
      lstat("/sys/kernel/debug/lustre/mgs/MGS/osd/io_latency_stats", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
      openat(AT_FDCWD, "/sys/kernel/debug/lustre/mgs/MGS/osd/io_latency_stats", O_RDONLY) = 3
      read(3, "io_latency_by_size:\nsnapshot_tim"..., 4096) = 127
      read(3, "", 4096)                       = 0
      close(3)                                = 0
      lstat("/sys/kernel/debug/lustre/mgs/MGS/osd/oi_scrub", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
      openat(AT_FDCWD, "/sys/kernel/debug/lustre/mgs/MGS/osd/oi_scrub", O_RDONLY) = 3
      read(3, "name: OI_scrub\nmagic: 0x4c5fe253"..., 4096) = 516
      read(3, "", 4096)                       = 0
      close(3)                                = 0
      lstat("/sys/kernel/debug/lustre/mgs/MGS/osd/stats", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
      openat(AT_FDCWD, "/sys/kernel/debug/lustre/mgs/MGS/osd/stats", O_RDONLY) = 3
      read(3, "snapshot_time             177438"..., 4096) = 170
      read(3, "", 4096)                       = 0
      close(3)                                = 0
      :
      

      The code should definitely only be reading parameters whose name matches the requested argument, and skipping all of the others. That should speed up the command significantly, and reduce overhead/impact on a running system, as well as avoid other side effects (e.g. reading from job_stats clears old job statistics, and reading from kbytesfree, filesfree, etc. triggers OBD_STATFS RPCs to be sent to all of the targets).

      Attachments

        Issue Links

          Activity

            People

              paf0186 Patrick Farrell
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: