[LU-16228] create lljobstats command Created: 09/Oct/22  Updated: 07/Feb/24  Resolved: 27/Jan/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.16.0

Type: New Feature Priority: Minor
Reporter: Feng Lei Assignee: Feng Lei
Resolution: Fixed Votes: 0
Labels: None

Attachments: HTML File glljobstat     HTML File lljobstat    
Issue Links:
Related
is related to LU-16231 Lustre stats header incorrectly using... Resolved
is related to LU-16251 Fill jobid in an atomic way Resolved
is related to LU-16110 Make output of jobs_stats and rename_... Resolved
is related to LU-17352 Enhance lljobstat to read existing jo... Resolved
Rank (Obsolete): 9223372036854775807

 Description   

In DDN-3356, by Andreas Dilger:

We don't have a tool to do this today, but it would make sense to write a simple tool "lljobstat" to show the top jobs on a server in order to simplify debugging of high load problems, since this is a reasonably frequent request.

It should be included with the base Lustre RPMs, so it must not have any complex external dependencies that are not included in the base OS distro (el7, el8, sles15, ubuntu22).

It should read all of the local "..job_stats" files (by default, or --ost or --mdt, or a specific jobstats file if given as an argument) every 10s interval (configurable, either "-i N" or last argument) and prints the top e.g. 5 jobs (configurable "-c N"), one line per job similar to "iostat -x -k -z 10". It should show something useful when run with minimal arguments (eg. just the interval), so that users can use it to easily determine which jobs are driving the most load.

Since the job_stats has a large number of stats, it is not possible to fit all of them in a single 80-column line, so any operations that have samples = 0 should not be shown. Priority for display should be to show read, write (counts, if non-zero), read_bytes, write_bytes (in MiB/s units, if non-zero), then the top metadata ops by count. It probably makes sense to use abbreviations for the names, like llobdstat so that more can fit onto the line (cx: create, dx: destroy, st: statfs, pu: punch, etc). In the newer llstat and llobdstst it checks if the terminal width is over 80 and shows more fields, but this doesn't have to be in the first version.

To determine the "top" jobs, it probably makes sense to sum the operations for the same job name across all watched job_stats files, then sort by total count of operations (read+write, but not bytes) and include this as the second item shown ("ops: N") after the job name ("job: name", with escaping/quoting if needed). The timestamp should be shown for each interval.

Given that the input is YAML, the output could also be YAML, but only if it can be formatted nicely for human readability (one line per job, no excessive quoting). The main users of this will be people, since monitoring tools will likely read and process all of the job_stats output directly.



 Comments   
Comment by Feng Lei [ 09/Oct/22 ]

adilger What about such a format of output?

timestamp: 20221010090000
jobs: 
- {job: mkdir.100, ops: 3, cr: 1, dt: 2}
- {job: rm.101, ops: 1, dt: 1}
Comment by Andreas Dilger [ 10/Oct/22 ]

The timestamp should be Unix seconds lik the other timestamps reported by Lustre. That avoids time zone issues and simplifies log correlation.

Comment by Feng Lei [ 11/Oct/22 ]

Command Synopsis:

lljobstat [-i|--interval NUM] [-c|--count NUM] [--mdt|--ost|--param PARAM_PATH] 
  -i NUM: interval in seconds, default 10
  -c NUM: how many jobs are displayed, default 5
  --mdt: check only mdt job_stats
  --ost: check only ost job_stats
  --param PARAM_PATH: check specified PARAM_PATH, e.g., *.lustre-*.job_stats
Comment by Feng Lei [ 11/Oct/22 ]

adilger  To confirm that snapshot_time is designed to be uptime (the seconds from the last OS bootup), not clock time. For example:

# lctl get_param *.*.job_stats | grep snapshot
  snapshot_time:   5754772.790688109 secs.nsecs

It is significantly different from epoch seconds:

# date +%s
1665461988

But similar to system uptime:

# cat /proc/uptime
5755466.00 22244003.59
Comment by Andreas Dilger [ 11/Oct/22 ]

No, the time should be the current Unix timestamp in seconds:

# lctl get_param llite.*.stats
llite.testfs-ffff89b1b9c27000.stats=
snapshot_time             1665476432.161461498 secs.nsecs
ioctl                     502 samples [reqs]
getattr                   290 samples [usec] 56 1059 48623 11761597
getxattr                  2 samples [usec] 975 30159 31134 910515906
inode_permission          298 samples [usec] 61 566 52783 11517621
opencount                 295 samples [reqs] 1 1 295 295
# date +%s
1665476439

there is a bug on master that the timestamp is incorrectly printing the boot-relative time instead of the wallclock time. See LU-16231.

Comment by Feng Lei [ 12/Oct/22 ]

adilger  Is such an output OK?

 

# ./lljobstat
# Abbr.:
# cr: create,    op: open,      cl: close,     mn: mknod,     lk: link,     
# ul: unlink,    mk: mkdir,     rm: rmdir,     mv: rename,    ga: getattr,  
# sa: setattr,   gx: getxattr,  sx: setxattr,  st: statfs,    sy: sync,     
# rd: read,      wr: write,     pu: punch,     mi: migrate,   fa: fallocate,
# dt: destroy,   gi: get_info,  si: set_info,  qc: quotactl,  pa: prealloc, 
timestamp: 1665557039
top jobs:
- touch.500:       {ops: 6, op: 1, cl: 1, mn: 1, ga: 1, sa: 2}
- rm.0:            {ops: 6, cl: 2, ul: 1, rm: 1, ga: 1, st: 1}
- chown.0:         {ops: 3, ga: 2, sa: 1}
- bash.0:          {ops: 2, ga: 2}
- mkdir.0:         {ops: 2, mk: 1, st: 1}
Comment by Andreas Dilger [ 12/Oct/22 ]

Feng Lei, this looks mostly good. I would say that the comment is large enough that it shouldn't be printed each time, maybe just document the abbreviations in the man page or if "-h" is used. I would suggest "ln" for link (to match the command name).

The "top_jobs:" should have an underscore so it is a single word, even though I know YAML does not require this, since it makes parsing easier with scripts (eg. "awk '/keyname:/ { print $2 }'".

Comment by Gerrit Updater [ 17/Oct/22 ]

"Feng Lei <flei@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/48888
Subject: LU-16228 utils: add lljobstat util
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 777dd8757b0a121daf22f275f7eeb5a2b00ea62f

Comment by Andreas Dilger [ 25/Jan/23 ]

It looks like the newly-added sanity.sh test_205e needs to add a version check for interop testing:

trevis-82vm3: sh: lljobstat: command not found

There is a version check in test_205d already.

Comment by Gerrit Updater [ 27/Jan/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48888/
Subject: LU-16228 utils: add lljobstat util
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e2812e877314bc101efdc5a235c7fae8f7424f96

Comment by Peter Jones [ 27/Jan/23 ]

Landed for 2.16

Comment by Andreas Dilger [ 17/Aug/23 ]

bolausson, I pushed the "simple" version of your patch but it is reporting an error:

This is causing test failures:

 lljobstat -n 1 -i 0 -c 1000
  Traceback (most recent call last):
    File "/usr/bin/lljobstat", line 15, in 
       from yaml import CLoader as Loader, CDumper as Dumper
  ImportError: cannot import name 'CLoader'
Comment by Bjoern Olausson [ 17/Aug/23 ]

See link for solution:

https://github.com/yaml/pyyaml/issues/108#issuecomment-370459912

Essentially libyaml-dev is missing on your system. It is required for the CLoader (which replaces the slow Python loader)

Greetings, Bjoern

Comment by Andreas Dilger [ 17/Aug/23 ]

Bjoern, is there a way to "try" loading the libyaml-dev CLoader, but fall back to the regular Loader if it is not installed?

Comment by Bjoern Olausson [ 17/Aug/23 ]

Yes this is possible with a try - except construct.

The CLoader worked perfeclty fine on default EXAScaler 5.2.7 install.
It was not neccessary to install any aditional packages except:

python3 -m venv lljobstat
. ./lljobstat/bin/activate
python3 -m pip install pyyaml
python3 -m pip install paramiko
python3 -m pip install urllib3 

By the way, I added my enhneced version to the DDNeu GitHub repo:
https://github.com/DDNeu/global-lustre-jobstats

Cheers,
Bjoern

Comment by Bjoern Olausson [ 17/Aug/23 ]

Here the lines you would need to change:

 

 

#!/bin/env python3
'''
lljobstat command. Read job_stats files, parse and aggregate data of every
job on multiple OSS/MDS, show top jobs
'''
import argparse
import errno
import subprocess
import sys
import time
import yaml
import signal
import urllib3
import warnings
import configparser
from multiprocessing import Process, Queue, Pool, Manager, active_children, Pipe
from subprocess import Popen, PIPE, STDOUT
from pprint import pprint
from os.path import expanduser
from pathlib import Path
try:
    from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
    pass
warnings.filterwarnings(action='ignore',module='.*paramiko.*')
urllib3.disable_warnings()
[...]
Comment by Feng Lei [ 18/Aug/23 ]

is there a way to "try" loading the libyaml-dev CLoader, but fall back to the regular Loader if it is not installed?

It can be checked at runtime:

if hasattr(yaml, "CLoader"):
    yaml_obj = yaml.load(output, Loader=yaml.CLoader)
else:
    yaml_obj = yaml.safe_load(output)
Comment by Bjoern Olausson [ 18/Aug/23 ]

That works as well but has the disatvantage that you have to use the conditional check whenever you use yaml.load() anywhere in the code.

This is only required once:

try:
    from yaml import CLoader as Loader
except ImportError:
    from yaml import Loader

and you could add a note on one time on each start of the program:

try:
    from yaml import CLoader as Loader
except ImportError:
    print("Install libyaml-dev for faster processing", file=sys.stderr)
    from yaml import Loader

Example:

(lljobstat) [root@n2admin1 bolausson]# ./glljobstat.py -n1 -c3 
Install libyaml-dev for faster processing
---
timestamp: 1692341521
top_jobs:
- .0@n2oss4:       {ops: 499163955, op: 11394216, cl: 41637516, mn: 9374215, ga: 191342407, sa: 88644483, gx: 6939749, sx: 146146, st: 2610083, sy: 36495657, rd: 65229069, wr: 42911419, pu: 2438995}
- .0@n2oss8:       {ops: 473355574, op: 7909593, cl: 31620149, mn: 6376866, ga: 82344877, sa: 97854466, gx: 6512529, sx: 29034, st: 51, sy: 39334661, rd: 130433638, wr: 66882172, pu: 4057538}
- .0@n2oss7:       {ops: 419629946, op: 7035889, cl: 27444959, mn: 5526838, ga: 78507580, sa: 94406102, gx: 5645268, sx: 20790, st: 34, sy: 37915437, rd: 93283959, wr: 66197236, pu: 3645854}
...
(lljobstat) [root@n2admin1 bolausson]# 

Attached the modified lljobstat:
lljobstat

Cheers,
Bjoern

Comment by Andreas Dilger [ 18/Aug/23 ]

I think the best approach is to Suggest: or Recommend: the faster libyaml-dev in lustre.spec.in (for all except el7.9 which doesn't support this, see other similar checks therein), and keep the try/except for fallback if it isn't installed.

However, I do not think it makes sense to print a message in that case, as it breaks the output, and I don't think users care so much if it "just works" for them.

Feng Lei, can you please also backport the "fix YAML printing of jobstats" patches to b_es5_2 (there are about 3 of them, but not the stats header or histogram patches), so that we get proper quoting of the jobid name in the job_stats output. While the "@" substitution will fix the one case running with DDN Insight, it will not handle all cases of bad jobid names.

Comment by Bjoern Olausson [ 18/Aug/23 ]

Makes sense

Thanks Andreas!

Comment by Bjoern Olausson [ 19/Aug/23 ]

Okay, now we are getting to something that is actually pretty useful:

https://github.com/DDNeu/global-lustre-jobstats

It is faster by factors!

If you don't want all the bells and wistles because of the additional modules (paramiko), you might want to try the naive parser with parallel parsing instead of yaml.load(). It is a drop-in replacement, no other code-changes required.
It is way faster even compared to the parallel yaml CLoader! It is now in a range where you can run it in a loop and watch the rates "live".

My naive parser:

(lljobstat) [root@n2oss1 bolausson]# time ./glljobstat_testing.py -n 1 -c 2
SSH time         : 0.837817907333374
Bjoern time      : 2.07401442527771
---
timestamp: 1692439321
servers_queried: 8
total_jobs: 2601
top_2_jobs:
- 4635385@46526@n2cn0225: {ops: 589959692, rd: 589959689, wr: 3}
- @0@n2oss4:              {ops: 485340474, op: 10540091, cl: 34838831, mn: 8118882, ga: 191221978, sa: 84975235, gx: 5400547, sx: 145756, st: 2610082, sy: 34832893, rd: 66827250, wr: 43403088, pu: 2425841}
...
real    0m4.603s
user    0m10.878s
sys     0m1.994s 

 

yaml.load() with CLoader

(lljobstat) [root@n2oss1 bolausson]# time ./glljobstat.py -n 1 -c 2
SSH time         : 0.8781006336212158
yaml CLoader time: 9.084490060806274
---
timestamp: 1692439328
servers_queried: 8
total_jobs: 2601
top_2_jobs:
- 4635385@46526@n2cn0225: \{ops: 589957196, rd: 589957193, wr: 3}
- .0@n2oss4:       \{ops: 485340452, op: 10540089, cl: 34838826, mn: 8118881, ga: 191221973, sa: 84975231, gx: 5400546, sx: 145756, st: 2610082, sy: 34832891, rd: 66827249, wr: 43403087, pu: 2425841}
...

real	0m11.095s
user	0m55.775s
sys	0m4.393s 

 

Comment by Gerrit Updater [ 07/Feb/24 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/doc/manual/+/53948
Subject: LU-16228 utils: update jobstats section
Project: doc/manual
Branch: master
Current Patch Set: 1
Commit: 39b650af5c4d52533d5bf7d388f179403f14693d

Comment by Gerrit Updater [ 07/Feb/24 ]

"Andreas Dilger <adilger@whamcloud.com>" merged in patch https://review.whamcloud.com/c/doc/manual/+/53948/
Subject: LU-16228 utils: update jobstats section
Project: doc/manual
Branch: master
Current Patch Set:
Commit: 58f5e8ac8970efcbcbf44889b14e9c6400c29e3d

Generated at Sat Feb 10 03:25:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.