Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17236

clearing non-existent job from job_stats returns EINVAL

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      Writing a job name to the job_stats file is intended to clear the stats for that job:

      # lctl get_param mdt.*.job_stats | grep -v " 0, unit:" | less
      mdt.myth-MDT0000.job_stats=
      job_stats:
      - job_id: PT277.500
        snapshot_time:   1698471105
        getattr: { samples: 522743, unit: usecs, min:  2, max: 91662, sum: 2790673, sumsq: 23649961511 }
        statfs:  { samples: 173907, unit: usecs, min:  2, max:    81, sum:  884179, sumsq:     5283201 }
      - job_id: mythfrontend.500
        snapshot_time:   1698470505
        getattr: { samples:    1259, unit: usecs, min:  2, max:   466, sum:    9037, sumsq:     1241873 }
      - job_id: Expire.500
        snapshot_time:   1698470998
        getattr: { samples:      3, unit: usecs, min:  5, max:     6, sum:      16, sumsq:          86 }
        statfs:  { samples:      1, unit: usecs, min: 11, max:    11, sum:      11, sumsq:         121 }
      # lctl set_param *.*.job_stats=Expire.500
      mdt.myth-MDT0000.job_stats=Expire.500
      error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0000/job_stats=Expire.500: Invalid argument
      error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0001/job_stats=Expire.500: Invalid argument
      error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0002/job_stats=Expire.500: Invalid argument
      error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0003/job_stats=Expire.500: Invalid argument
      error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0004/job_stats=Expire.500: Invalid argument
      

      This clearing works for the job_stats files that have the specified JobID, but returns -EINVAL for the targets that do not have this JobID. I think the "Invalid argument" error is not very helpful. Returning "-ESRCH" ("No such process") in this case would be an improvement, since there are already several other "-EINVAL" return cases in lprocfs_jobstats_seq_write().

      However, if you are trying to delete some stats, and they are already gone, then it shouldn't report an error at all, since it is otherwise difficult to see that one parameter was successfully set (in the above example "myth-MDT0000") and only the failed parameters were printing anything, especially if there are a large number of devices.

      Attachments

        Activity

          People

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: