[LU-4935] Collect job stats by both procname_uid and scheduler job ID Created: 21/Apr/14 Updated: 22/Nov/14 Resolved: 22/Nov/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Scott Nolin | Assignee: | WC Triage |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Rank (Obsolete): | 13639 | ||||||||
| Description |
|
I have no insight into the code, so there may be some reasons this is unworkable, but here's what I'm thinking. 1) If you set jobid_var to procname_uid, you capture every process that is using the file system. This is nice for debugging, but not that useful for job statistics, as processes certainly can have similar names/uid across jobs. 2) If you set jobid_var to the scheduler of your choice, like SLURM_JOB_ID, you of course get those statistics. But if someone is for example sitting on a submit node and issuing commands, those aren't seen. Would it be possible to enable collection on both? If every request has the job_id, process name, and UID packed in, why not get it all? So if you had a job with a scheduler jobid of "123", run from uid 555, and let's say it runs 2 processes, process1 and process2. Could you then have jobstats report job_id that look like: 123.process1.555 A process 'myscript' not run via the scheduler by uid 561 could be 0.myscript.561 Besides not losing the statistics for "non-scheduler" lustre requests, you have possible a little more insight into your job if it's a multi-step type job. Finally, to take it to the extreme - consider that we run filesystems which may be accesses by different schedulers, say slurm and SGE on different systems (yes, this happens!). Why not include every possible scheduler scheme? So you end up with something like: SLURM_JOB_ID.JOB_ID.LSB_JOBID.LOADL_STEP_ID.PBS_JOBID.ALPS_APP_ID.procname.uid So the example above would be: 123.0.0.0.0.0.process1.555 I would not be surprised if this is potentially stupid. One thing, is it's overloading a variable to be an array of data. It's also using a character valid for filenames "." as a field seperator. Scott |
| Comments |
| Comment by Oleg Drokin [ 21/Apr/14 ] |
|
I think it's mostly possible in a less automated way. |
| Comment by Andreas Dilger [ 21/Apr/14 ] |
|
Note that it is possible to specify different jobid values to be collected on the login nodes and the compute nodes, and for that matter to specify different jobid sources on different compute nodes. It is not currently possible to collect both of these statistics at one time, nor as your final proposal suggests to collect a myriad of different identifiers at one time. The jobid is sent with every RPC from the client, and has a limited space in which to do so. At design time we surveyed the job schedulers and picked a maximum jobid size that would satisfy their requirements. It doesn't make sense that a single node should be subject to different job schedulers at one time. Since it is possible to specify different jobid sources on different clients, and the resulting identifiers should still be unique and identifiable by their naming structure, this should meet most of the requirements here. Since the jobid source is itself just the name of an environment variable in which to find the identifier, it is possible for your runtime environment to generate a new environment variable that could contain any identifier of your choice, subject to the size limits in struct ptlrpc_body (32 bytes, I believe). |
| Comment by Scott Nolin [ 21/Apr/14 ] |
|
To be sure I understand this right, I could do something like this: lctl conf_param testfs.sys.jobid_var=SLURM_JOB_ID -to set the filesystem to collect assuming SLURM_JOB_ID I can see how on for example an SGE node, you could set SLURM_JOB_ID to the SGE job id and for a job. How do you set SLURM_JOB_ID to procname.uid for every random process on a system, say on your login node? This is really the case that's more useful for us and mysterious to me. I apologize if I'm being particularly dense here. Scott |
| Comment by Andreas Dilger [ 06/May/14 ] |
|
You can set lctl conf_param testfs.sys.jobid_var=SLURM_JOB_ID to set the global default jobid source (this is only needed once), and then lctl set_param jobid_var=procname_uid on the login nodes in a startup script. If you have other nodes that are running SGE you can run lctl set_param jobid_var=JOB_ID. On the nodes with SLURM_JOB_ID the jobids tracked by Lustre will be of the form NNNNNNNN (32-bit integer), on the SGE nodes they would be of the form NNNNN (5 decimal digits), and on the login nodes it would be process.NNNN. There would be a chance of conflict between SLURM and SGE, but unlikely. |
| Comment by Scott Nolin [ 06/May/14 ] |
|
Thank you Andreas, this is perfect. I assumed setting the jobid source parameter always set the global default, didn't realize subsequent settings to different values only apply to the client. So I assume to reset the global you would set to disable, then start over. I read the man page for "lctl" and "conf_param" section says "Set a permanent configuration parameter for any device via the MGS. This command must be run on the MGS node.", and the manual seemed to suggest the same - so I took that to mean that all settings are to be done on the mgs and global. Since no improvement is needed this can simply be closed unless there's some documentation needed. Thanks again, |
| Comment by Andreas Dilger [ 06/May/14 ] |
|
To clarify - the "lctl conf_param" (or in Lustre 2.5 and later "lctl set_param -P") settings on the MGS are global and persistent, while "lctl set_param" settings are local to the node on which they are run and only last until the filesystem unmounts. As yet there is no way to specify persistent settings for only a subset of nodes, but there are many other mechanisms for achieving this. |
| Comment by Andreas Dilger [ 16/May/14 ] |
|
It would probably be good to get something into the manual related to this, in case it comes up again in the future. |
| Comment by Scott Nolin [ 27/Aug/14 ] |
|
Andreas, I just had a client restart (rebooted) and the client-only /proc/fs/lustre/jobid_var setting persisted on the client. I expected it to go away and use the MGS value. Scott |
| Comment by Andreas Dilger [ 22/Nov/14 ] |
|
|