[LU-12771] jobid "stat.0" Created: 16/Sep/19 Updated: 10/Oct/19 Resolved: 10/Oct/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.8 |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Minor |
| Reporter: | Mahmoud Hanafi | Assignee: | Peter Jones |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
We see jobid "stat.0" doing lots of rpcs. Is this part of the server code? |
| Comments |
| Comment by Andreas Dilger [ 16/Sep/19 ] |
|
The "stat.0" JobID looks like it is a shell script running as root on some client (using "jobid_var=procname_uid", so probably a login node) calling "stat(1)" on a lot of files. This is something specific to your site. You should be able to track it down to a specific client by checking "lctl get_param mdt.*.exports.*.stats" on the MDS to see which client is doing a lot of "getattr" operations. |
| Comment by Mahmoud Hanafi [ 19/Sep/19 ] |
|
What is usual is this shows up on all OSSes and all OSTs. I try to tack down the clients and they are all over. Not specific to a single job or user or node type. |
| Comment by Andreas Dilger [ 20/Sep/19 ] |
|
It isn't unusual that this would be seen on all OSTs if the script is traversing the filesystem and calling stat(1) on every file. It seems unlikely that the "stat" command itself would run very long, so it is likely part of some larger shell script. If you have at least some idea of which client it is running on, you could try running a search on each client for the stat process, like "while sleep 0.5; do ps auxwwf | grep -B5 stat | grep -v grep; done" to search for "stat" in the process table. |
| Comment by Mahmoud Hanafi [ 10/Oct/19 ] |
|
We can close this. |
| Comment by Peter Jones [ 10/Oct/19 ] |
|
ok - thanks |