[LU-694] Job Stats Created: 21/Sep/11  Updated: 22/Nov/14  Resolved: 04/Jun/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0
Fix Version/s: Lustre 2.3.0

Type: Improvement Priority: Minor
Reporter: Niu Yawei (Inactive) Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-2058 Job Stats for ZFS Closed
Rank (Obsolete): 4306

 Description   

This feature is to collect filesystem operation stats for the jobs running on Lustre.

When some job sheculer (SLURM, for instance) is running on lustre client, the lustre client will pack the job id into each request (open, unlink, write...), and server will collect those information then expose them via procfs.



 Comments   
Comment by Niu Yawei (Inactive) [ 21/Sep/11 ]
  • Enable/Disable Jobstats feature

The Jobstats is disabled by default, that can be verified by checking the /proc/fs/lustre/jobid_var on client,
the'jobid_var' should be 'disable' by default.

  lctl get_param jobid_var
  jobid_var=disable
  

To enable the Jobstats, one can specify the 'jobid_var' for a certain job scheduler.

  • Configure 'jobid_var' for specified job scheduler

To enable Jobstats for certain job scheduler, the 'jobid_var' should be configured as proper value:
SLURM: jobid_var=SLURM_JOB_ID
SGE: jobid_var=JOB_ID
LSF: jobid_var=LSB_JOBID
Loadleveler: jobid_var=LOADL_STEP_ID
PBS: jobid_var=PBS_JOBID
Maui/MOAB: jobid_var=PBS_JOBID

For example, to enable Jobstats for SLURM on a fs named 'lustre':

  lctl conf_param testfs.sys.jobid_var=SLURM_JOB_ID
  

Disable Jobstats on a fs named 'lustre':

  lctl conf_param testfs.sys.jobid_var=disable
  

If there isn't any job scheduler is running over the system, or user just want to collect the stats for process & uid:

  lctl conf_param testfs.sys.jobid_var=procname_uid
  
  • Check Job stats

The metadata operation stats is collected on MDT, and one can access it via lctl get_param mdt.*.job_stats:

  lctl get_param mdt.lustre-MDT0000.job_stats
job_stats:
- job_id:          bash.0
  snapshot_time:   1352084992
  open:            { samples:           2, unit:  reqs }
  close:           { samples:           2, unit:  reqs }
  mknod:           { samples:           0, unit:  reqs }
  link:            { samples:           0, unit:  reqs }
  unlink:          { samples:           0, unit:  reqs }
  mkdir:           { samples:           0, unit:  reqs }
  rmdir:           { samples:           0, unit:  reqs }
  rename:          { samples:           0, unit:  reqs }
  getattr:         { samples:           3, unit:  reqs }
  setattr:         { samples:           0, unit:  reqs }
  getxattr:        { samples:           0, unit:  reqs }
  setxattr:        { samples:           0, unit:  reqs }
  statfs:          { samples:           0, unit:  reqs }
  sync:            { samples:           0, unit:  reqs }
  samedir_rename:  { samples:           0, unit:  reqs }
  crossdir_rename: { samples:           0, unit:  reqs }
- job_id:          dd.0
  snapshot_time:   1352085037
  open:            { samples:           1, unit:  reqs }
  close:           { samples:           1, unit:  reqs }
  mknod:           { samples:           0, unit:  reqs }
  link:            { samples:           0, unit:  reqs }
  unlink:          { samples:           0, unit:  reqs }
  mkdir:           { samples:           0, unit:  reqs }
  rmdir:           { samples:           0, unit:  reqs }
  rename:          { samples:           0, unit:  reqs }
  getattr:         { samples:           0, unit:  reqs }
  setattr:         { samples:           0, unit:  reqs }
  getxattr:        { samples:           0, unit:  reqs }
  setxattr:        { samples:           0, unit:  reqs }
  statfs:          { samples:           0, unit:  reqs }
  sync:            { samples:           2, unit:  reqs }
  samedir_rename:  { samples:           0, unit:  reqs }
  crossdir_rename: { samples:           0, unit:  reqs }
  

The data operation stats is collected on OST, can one can check it via lctl get_param obdfilter.*.job_stats:

  lctl get_param obdfilter.lustre-OST0000.job_stats
job_stats:
- job_id:          bash.0
  snapshot_time:   1352085025
  read:            { samples:           0, unit: bytes, min:       0, max:       0, sum:               0 }
  write:           { samples:           1, unit: bytes, min:       4, max:       4, sum:               4 }
  setattr:         { samples:           0, unit:  reqs }
  punch:           { samples:           0, unit:  reqs }
  sync:            { samples:           0, unit:  reqs }
  
  • Clear job stats for specified job (or all jobs)

One can clear the job stats for a certain MDT or OST by writing the proc file 'job_stats'.

Clear stats for all job on testfs-OST0001:

  lctl set_param obdfilter.testfs-OST0001.job_stats=clear
  

Clear stats for job "dd.0" on lustre-MDT0000:

  lctl set_param mdt.lustre-MDT0000.job_stats=dd.0
  
  • Configure cleanup interval

By default, if some job doesn't have any activities for 600 seconds, it's stats will be cleared, this expiration value
is tunable via mdt..job_cleanup_interval and obdfilter..job_cleanup_interval.

for instance, change the cleanup interval to just over an hour (4000) seconds for MDT:

  lctl conf_param lustre.mdt.job_cleanup_interval=4000
  

The 'job_cleanup_interval' can be set as 0 to disable the auto-cleanup.

Comment by Niu Yawei (Inactive) [ 21/Sep/11 ]

http://review.whamcloud.com/1397

Comment by Niu Yawei (Inactive) [ 10/Nov/11 ]

follow-up patch which moves 'jobid_var' to global: http://review.whamcloud.com/1683

Comment by Shuichi Ihara (Inactive) [ 30/Jan/12 ]

Hello, Niu
new patches for this feature will be landed for 2.2?

Comment by Niu Yawei (Inactive) [ 30/Jan/12 ]

Hi, Ihara

The patch will not be landed for 2.2, which version it should be landed for is not decided yet.

Comment by Richard Henwood (Inactive) [ 27/Apr/12 ]

I have been advised that a filesystem name may not uniquely identify a lustre filesystem.

I am not sure what a better choice for the command you have above is, but some thought as to an alternative to fs name would be valuable.

Comment by Niu Yawei (Inactive) [ 29/Apr/12 ]

I have been advised that a filesystem name may not uniquely identify a lustre filesystem.

I am not sure what a better choice for the command you have above is, but some thought as to an alternative to fs name would be valuable.

Hi, Richard, fs name should be unique on a single MGS namespace, and most 'lctl conf_param' uses fsname to identify a filesystem. Do ou suggest that we'd set jobstats parameters per target server but not per fs? I'm not sure if I followed your comment correctly?

Comment by Nathan Rutman [ 30/Apr/12 ]

The intent of the MGS was to provide config info for all the filesystems at a site, so the fs name is unique. If multiple MGS's are being used, on different nodes, the filesystem name could overlap – but you'd have to be masochistic to use the same filesystem name for two different filesystems at a single site.
Masochistic to the point where I would say this should be disallowed by any and all configuration management systems, and if you do it by hand anyhow, you reap the unpleasant rewards.

Comment by Christopher Morrone [ 30/Apr/12 ]

Nathan, it boggles my mind as well. But I know for a fact that folks out there have done it, because they complained about LMT not being able to handle two filesystem having exactly the same name. They seemed to think it was Livermore's responsibility to factor in additional information like IP addresses to uniquely identify filesystems with the same name. I of course declined.

But one has to sympathize with the users. Configuring lustre is so horribly bad that something like the filesystem name is completely non-obvious to most people. You set it once in some cryptic way, and it none too clear from that point forward how it is used at all.

Which I suppose is a long winded way of agreeing that filesystem names really need to be unique, and bending over backwards to differentiate filesystems with the same name is a path to madness. But we also need to promote the name to a first-class object that is used in a sane way in the command-line tools and throughout lustre. We also need to clearly document filesystem name usage.

Comment by Richard Henwood (Inactive) [ 02/May/12 ]

Thanks for the input. I agree that fs names are useful, for example:

  • In the cases where there is only one MGS at a site, you need fs_name to distinguish the fs.
  • In the cases where you have not mounted the fs, you can still identify the fs.

So, how about supporting: <mount point|fsname>?

If the mount point is not valid, then error

Comment by Andreas Dilger [ 22/Nov/14 ]

Just updated examples in this bug to be more clear, since it showed up in a Google search. I prefer not to use "lustre" as the fsname in examples, since it is very non-obvious that this needs to be replaced with the actual fsname and is not an fixed part of the parameter being specified (like the "sys.jobid_var" part is).

Generated at Sat Feb 10 01:09:31 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.