[LU-17343] mechanism to resolve 'lctl get_param' parameters to pathnames Created: 07/Dec/23 Updated: 29/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.16.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | easy | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
The "lctl get_param" and "lctl list_param" commands allow resolving and dumping Lustre parameters and statistics in a simple manner, as it handles the internal details of where the parameters are stored. Due to ongoing changes in the kernel policies on where parameters are located, the actual pathname of a parameter/stats file may change by kernel and Lustre release. Originally, parameters were all under "/proc/fs/lustre" and "/proc/net/lnet", but then "/proc" usage was deprecated so many parameters moved to "/sys/fs/lustre", but this is constrained to being simple "name=value" pairs, and "complex" (multi-value, multi-line) parameters were moved to "/sys/kernel/debug/lustre" and "/sys/kernel/debug/lnet". Unfortunately, the kernel changed the access policy for "/sys/kernel/debug" to be accessible only to the root user, and as such any data collection tools must also run as root in order to access these statistics, until such a time we create our own "lprocfs" to hold statistics and allow them to be accessed by non-root users. In the meantime, for tools that monitor Lustre statistics, it is desirable to avoid the overhead of "lctl get_param" trying to do multiple pathname resolutions each time a parameter is accessed, which might be once per second or more. This drives the tools to hard-code the direct parameter pathnames into the tool instead of using "lctl get_param" to access the parameters, which is fragile and may break between releases. It is desirable to have a mechanism to "resolve" parameter/stats pathnames for the currently-running Lustre version so that they can be used directly, without hard coding them. To do this, it would be useful to have an option "lctl get_param [-p|--path] PARAM" and "lctl list_param [-p|--path] PARAM" that prints the actual pathname(s) for PARAM instead of the value. There is already a function "llapi_param_get_paths()" that is available to resolve the parameter name to one or more pathnames, but it needs the "[-p|--path]" option implemented to allow printing these pathnames to stdout for use by the monitoring tools. |
| Comments |
| Comment by Andreas Dilger [ 07/Dec/23 ] |
|
I think it would be a relatively simple change to the lctl tool - basically just adding the parsing of the "[-p|--path]" option that works mostly like "lctl get_param -N" and "lctl list_param", but having it print the pathname paths.gl_pathv[i] of the parameters directly instead of calling display_name() to convert the parameters to the dot-separated format. There are some cleanups in the code that should be done, like rename "po_only_path" and "po_show_path" to be "po_only_name" and "po_show_name" and add a new "po_only_pathname" (so it doesn't get confused with the old "po_only/show_path"). Similarly, a new LIST_PATHNAME should be added to enum parameter_operation to differentiate this from LIST_PARAM. |
| Comment by James A Simmons [ 07/Dec/23 ] |
|
Our tools never get love |
| Comment by Steve Crusan [ 11/Dec/23 ] |
|
This would be very useful for us. A long time ago when we wrote our tools, we looked at strace and the code and saw lctl was just doing it's own path resolution to open these files anyway, so we now just skip all of that and try to dynamically find the "proc" files directly outside of lctl. Being able to do "lctl list_param -p -R ..." and some caching + simple path heuristics would mean a lot less evil /proc and /sys path like calls would certainly save some CPU time and be a lot less messy. |
| Comment by Andreas Dilger [ 29/Jan/24 ] |
I think that is a much bigger pill than most developers are willing to swallow. That requires adding a huge amount of complexity to the tools to manage Netlink and YAML, instead of "open/read". I think this change to dynamically discover the pathnames on the currently running system would be fairly easy to implement (either as an installation step to run a script and generate a config file that holds the pathnames for the stats files), or as a few-line change to run "lctl list_param -p PARAM" to generate the pathname at runtime instead of hard-coding the absolute pathname into the binary. |