[LU-17236] clearing non-existent job from job_stats returns EINVAL Created: 28/Oct/23  Updated: 28/Oct/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Writing a job name to the job_stats file is intended to clear the stats for that job:

# lctl get_param mdt.*.job_stats | grep -v " 0, unit:" | less
mdt.myth-MDT0000.job_stats=
job_stats:
- job_id: PT277.500
  snapshot_time:   1698471105
  getattr: { samples: 522743, unit: usecs, min:  2, max: 91662, sum: 2790673, sumsq: 23649961511 }
  statfs:  { samples: 173907, unit: usecs, min:  2, max:    81, sum:  884179, sumsq:     5283201 }
- job_id: mythfrontend.500
  snapshot_time:   1698470505
  getattr: { samples:    1259, unit: usecs, min:  2, max:   466, sum:    9037, sumsq:     1241873 }
- job_id: Expire.500
  snapshot_time:   1698470998
  getattr: { samples:      3, unit: usecs, min:  5, max:     6, sum:      16, sumsq:          86 }
  statfs:  { samples:      1, unit: usecs, min: 11, max:    11, sum:      11, sumsq:         121 }
# lctl set_param *.*.job_stats=Expire.500
mdt.myth-MDT0000.job_stats=Expire.500
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0000/job_stats=Expire.500: Invalid argument
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0001/job_stats=Expire.500: Invalid argument
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0002/job_stats=Expire.500: Invalid argument
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0003/job_stats=Expire.500: Invalid argument
error: set_param: setting /proc/fs/lustre/obdfilter/myth-OST0004/job_stats=Expire.500: Invalid argument

This clearing works for the job_stats files that have the specified JobID, but returns -EINVAL for the targets that do not have this JobID. I think the "Invalid argument" error is not very helpful. Returning "-ESRCH" ("No such process") in this case would be an improvement, since there are already several other "-EINVAL" return cases in lprocfs_jobstats_seq_write().

However, if you are trying to delete some stats, and they are already gone, then it shouldn't report an error at all, since it is otherwise difficult to see that one parameter was successfully set (in the above example "myth-MDT0000") and only the failed parameters were printing anything, especially if there are a large number of devices.


Generated at Sat Feb 10 03:33:46 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.