Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.14.0
-
None
-
3
-
9223372036854775807
Description
Writing a job name to the job_stats file is intended to clear the stats for that job:
# lctl get_param mdt.*.job_stats | grep -v " 0, unit:" | less
mdt.myth-MDT0000.job_stats=
job_stats:
- job_id: PT277.500
snapshot_time: 1698471105
getattr: { samples: 522743, unit: usecs, min: 2, max: 91662, sum: 2790673, sumsq: 23649961511 }
statfs: { samples: 173907, unit: usecs, min: 2, max: 81, sum: 884179, sumsq: 5283201 }
- job_id: mythfrontend.500
snapshot_time: 1698470505
getattr: { samples: 1259, unit: usecs, min: 2, max: 466, sum: 9037, sumsq: 1241873 }
- job_id: Expire.500
snapshot_time: 1698470998
getattr: { samples: 3, unit: usecs, min: 5, max: 6, sum: 16, sumsq: 86 }
statfs: { samples: 1, unit: usecs, min: 11, max: 11, sum: 11, sumsq: 121 }
# lctl set_param *.*.job_stats=Expire.500
mdt.myth-MDT0000.job_stats=Expire.500
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0000/job_stats=Expire.500: Invalid argument
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0001/job_stats=Expire.500: Invalid argument
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0002/job_stats=Expire.500: Invalid argument
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0003/job_stats=Expire.500: Invalid argument
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0004/job_stats=Expire.500: Invalid argument
This clearing works for the job_stats files that have the specified JobID, but returns -EINVAL for the targets that do not have this JobID. I think the "Invalid argument" error is not very helpful. Returning "-ESRCH" ("No such process") in this case would be an improvement, since there are already several other "-EINVAL" return cases in lprocfs_jobstats_seq_write().
However, if you are trying to delete some stats, and they are already gone, then it shouldn't report an error at all, since it is otherwise difficult to see that one parameter was successfully set (in the above example "myth-MDT0000") and only the failed parameters were printing anything, especially if there are a large number of devices.