[LU-12771] jobid "stat.0" Created: 16/Sep/19  Updated: 10/Oct/19  Resolved: 10/Oct/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.8
Fix Version/s: None

Type: Question/Request Priority: Minor
Reporter: Mahmoud Hanafi Assignee: Peter Jones
Resolution: Fixed Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

We see jobid "stat.0" doing lots of rpcs. Is this part of the server code?



 Comments   
Comment by Andreas Dilger [ 16/Sep/19 ]

The "stat.0" JobID looks like it is a shell script running as root on some client (using "jobid_var=procname_uid", so probably a login node) calling "stat(1)" on a lot of files. This is something specific to your site.

You should be able to track it down to a specific client by checking "lctl get_param mdt.*.exports.*.stats" on the MDS to see which client is doing a lot of "getattr" operations.

Comment by Mahmoud Hanafi [ 19/Sep/19 ]

What is usual is this shows up on all OSSes and all OSTs. I try to tack down the clients and they are all over. Not specific to a single job or user or node type.

Comment by Andreas Dilger [ 20/Sep/19 ]

It isn't unusual that this would be seen on all OSTs if the script is traversing the filesystem and calling stat(1) on every file. It seems unlikely that the "stat" command itself would run very long, so it is likely part of some larger shell script.

If you have at least some idea of which client it is running on, you could try running a search on each client for the stat process, like "while sleep 0.5; do ps auxwwf | grep -B5 stat | grep -v grep; done" to search for "stat" in the process table.

Comment by Mahmoud Hanafi [ 10/Oct/19 ]

We can close this.

Comment by Peter Jones [ 10/Oct/19 ]

ok - thanks

Generated at Sat Feb 10 02:55:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.