[LU-17343] mechanism to resolve 'lctl get_param' parameters to pathnames Created: 07/Dec/23  Updated: 29/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.16.0
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: easy

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The "lctl get_param" and "lctl list_param" commands allow resolving and dumping Lustre parameters and statistics in a simple manner, as it handles the internal details of where the parameters are stored.

Due to ongoing changes in the kernel policies on where parameters are located, the actual pathname of a parameter/stats file may change by kernel and Lustre release. Originally, parameters were all under "/proc/fs/lustre" and "/proc/net/lnet", but then "/proc" usage was deprecated so many parameters moved to "/sys/fs/lustre", but this is constrained to being simple "name=value" pairs, and "complex" (multi-value, multi-line) parameters were moved to "/sys/kernel/debug/lustre" and "/sys/kernel/debug/lnet". Unfortunately, the kernel changed the access policy for "/sys/kernel/debug" to be accessible only to the root user, and as such any data collection tools must also run as root in order to access these statistics, until such a time we create our own "lprocfs" to hold statistics and allow them to be accessed by non-root users.

In the meantime, for tools that monitor Lustre statistics, it is desirable to avoid the overhead of "lctl get_param" trying to do multiple pathname resolutions each time a parameter is accessed, which might be once per second or more. This drives the tools to hard-code the direct parameter pathnames into the tool instead of using "lctl get_param" to access the parameters, which is fragile and may break between releases.

It is desirable to have a mechanism to "resolve" parameter/stats pathnames for the currently-running Lustre version so that they can be used directly, without hard coding them. To do this, it would be useful to have an option "lctl get_param [-p|--path] PARAM" and "lctl list_param [-p|--path] PARAM" that prints the actual pathname(s) for PARAM instead of the value.

There is already a function "llapi_param_get_paths()" that is available to resolve the parameter name to one or more pathnames, but it needs the "[-p|--path]" option implemented to allow printing these pathnames to stdout for use by the monitoring tools.



 Comments   
Comment by Andreas Dilger [ 07/Dec/23 ]

I think it would be a relatively simple change to the lctl tool - basically just adding the parsing of the "[-p|--path]" option that works mostly like "lctl get_param -N" and "lctl list_param", but having it print the pathname paths.gl_pathv[i] of the parameters directly instead of calling display_name() to convert the parameters to the dot-separated format.

There are some cleanups in the code that should be done, like rename "po_only_path" and "po_show_path" to be "po_only_name" and "po_show_name" and add a new "po_only_pathname" (so it doesn't get confused with the old "po_only/show_path"). Similarly, a new LIST_PATHNAME should be added to enum parameter_operation to differentiate this from LIST_PARAM.

Comment by James A Simmons [ 07/Dec/23 ]

Our tools never get love The plan for stats is to implement an Netlink / libyaml approach to allow non-root to get stats. That doesn't have a path in the classic sense. I have something kind of working that I can push.

Comment by Steve Crusan [ 11/Dec/23 ]

This would be very useful for us.

A long time ago when we wrote our tools, we looked at strace and the code and saw lctl was just doing it's own path resolution to open these files anyway, so we now just skip all of that and try to dynamically find the "proc" files directly outside of lctl. 

Being able to do "lctl list_param -p -R ..." and some caching + simple path heuristics would mean a lot less evil /proc and /sys path like calls would certainly save some CPU time and be a lot less messy.

Comment by Andreas Dilger [ 29/Jan/24 ]

The plan for stats is to implement an Netlink / libyaml approach to allow non-root to get stats. That doesn't have a path in the classic sense. I have something kind of working that I can push.

I think that is a much bigger pill than most developers are willing to swallow. That requires adding a huge amount of complexity to the tools to manage Netlink and YAML, instead of "open/read". I think this change to dynamically discover the pathnames on the currently running system would be fairly easy to implement (either as an installation step to run a script and generate a config file that holds the pathnames for the stats files), or as a few-line change to run "lctl list_param -p PARAM" to generate the pathname at runtime instead of hard-coding the absolute pathname into the binary.

Generated at Sat Feb 10 03:34:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.