[LU-16931] add lctl mechanism (list_param?) to report changes to tunable parameters Created: 28/Jun/23  Updated: 20/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: medium

Issue Links:
Related
is related to LU-8066 Move lustre procfs handling to sysfs ... Open
is related to LU-11077 Client-specific tunable parameter con... Open
is related to LU-17237 option for 'lctl list_param' to print... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It would be useful to have a generic mechanism that reports when tunable parameters are modified. For example with "lctl list_param -FR '*'" it will print whether the entry is a directory and if the parameter is writable with trailing "/=":

$ lctl list_param -FR '*'
:
jobid_name=
jobid_this_session=
jobid_var=
lbug_on_eviction=
ldlm/
ldlm.cancel_unused_locks_before_replay=
ldlm.namespaces/
ldlm.namespaces.MGC192.168.20.1@tcp/
ldlm.namespaces.MGC192.168.20.1@tcp.dirty_age_limit=
ldlm.namespaces.MGC192.168.20.1@tcp.early_lock_cancel=
ldlm.namespaces.MGC192.168.20.1@tcp.lock_count
:

Firstly, being able to list only writeable parameters with "lctl list_param -w PATTERN" would be useful for separating tunables from read-only parameters.

Secondly, and more importantly, being able to easily find/report parameters that were actually modified on the client or server for whatever reason (either from saved parameters on the MGS or manually with "lctl set_param") would make debugging system configuration issues much easier.

One limitation is that these parameters are exposed to userspace via /proc and /sys files, so just storing an internal flag for the modified parameter itself does not give us any way to report them directly to userspace.

One option would be to change the mtime of the parameter inode when it is modified? This might make it possible to distinguish those parameters that have a different mtime from the ctime. Currently they are always the same:

# find /{proc,sys}/fs/lustre /sys/kernel/debug/lustre -type f | grep lru_size | xargs ls -lc
-rw-r--r--. 1 root root 4096 Mar  9 11:33 /sys/fs/lustre/ldlm/namespaces/MGC192.168.20.1@tcp/lru_size
-rw-r--r--. 1 root root 4096 Apr 16 02:09 /sys/fs/lustre/ldlm/namespaces/myth-MDT0000-mdc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Mar  9 11:31 /sys/fs/lustre/ldlm/namespaces/myth-OST0000-osc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Apr 16 02:09 /sys/fs/lustre/ldlm/namespaces/myth-OST0001-osc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Mar  9 11:32 /sys/fs/lustre/ldlm/namespaces/myth-OST0002-osc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Mar  9 11:32 /sys/fs/lustre/ldlm/namespaces/myth-OST0003-osc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Mar  9 11:31 /sys/fs/lustre/ldlm/namespaces/myth-OST0004-osc-ffff979380fc1800/lru_size
# find /{proc,sys}/fs/lustre /sys/kernel/debug/lustre -type f | grep lru_size | xargs ls -lu
-rw-r--r--. 1 root root 4096 Mar  9 11:33 /sys/fs/lustre/ldlm/namespaces/MGC192.168.20.1@tcp/lru_size
-rw-r--r--. 1 root root 4096 Apr 16 02:09 /sys/fs/lustre/ldlm/namespaces/myth-MDT0000-mdc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Mar  9 11:31 /sys/fs/lustre/ldlm/namespaces/myth-OST0000-osc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Apr 16 02:09 /sys/fs/lustre/ldlm/namespaces/myth-OST0001-osc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Mar  9 11:32 /sys/fs/lustre/ldlm/namespaces/myth-OST0002-osc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Mar  9 11:32 /sys/fs/lustre/ldlm/namespaces/myth-OST0003-osc-ffff979380fc1800/lru_size
-rw-r--r--. 1 root root 4096 Mar  9 11:31 /sys/fs/lustre/ldlm/namespaces/myth-OST0004-osc-ffff979380fc1800/lru_size

I believe the different timestamps in this case are because the MDT0000 and OST0001 devices were remounted after the filesystem was first mounted. However, I haven't done any investigation on whether the mtime/ctime can be set differently for sysfs/procfs inodes, but I suspect they use regular struct inode and could be modified appropriately.

Another option would be to list the modified parameter names into an internal table and have a new parameter file that prints the modified parameter names. This could potentially be tricky to implement, because it isn't always clear from the parameter handler itself what the full pathname to the modified parameter is. However, this has the advantage that the parameter could easily be dumped (e.g. "lctl get_param modified_params") and would itself be included in a "dump all parameters" output like "lctl get_param -R '*'".

In addition to dumping the names of the modified parameters, it would be possible to actually keep a log of the original/new parameters, and what time they were changed, etc. However, at that point we might just consider to log all parameter changes to /var/log/messages so that they are readily available without any additional effort. The main drawback of this is on large clusters with many clients, as this may produce a lot of log spam if 10000 clients all report "parameter max_dirty_mb changed from 2048 to 4096" for dozens of parameters, especially if the contents are large.



 Comments   
Comment by Andreas Dilger [ 08/Jan/24 ]

I was testing on an el8 client and it looks like the timestamps for parameters under /proc/fs/lustre do not change when written to:

# ls -l /proc/fs/lustre/lov/*
total 0
0 dr-xr-xr-x. 2 root root 0 Dec 21 00:23 pools/
0 -rw-r--r--. 1 root root 0 Dec 19 14:16 stripesize
0 -r--r--r--. 1 root root 0 Dec 21 00:23 target_obd
# lctl set_param lov.*.stripesize=8M
lov.myth-clilov-ffff9799f5c86000.stripesize=8M
# ls -l /proc/fs/lustre/lov/*
total 0
0 dr-xr-xr-x. 2 root root 0 Dec 21 00:23 pools/
0 -rw-r--r--. 1 root root 0 Dec 19 14:16 stripesize
0 -r--r--r--. 1 root root 0 Dec 21 00:23 target_obd
# stat /proc/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripesize 
:
Access: 2023-12-19 14:16:05.558204389 -0700
Modify: 2023-12-19 14:16:05.558204389 -0700
Change: 2023-12-19 14:16:05.558204389 -0700
 Birth: -

I suspect all the timestamps are set when the procfs file is first created. However, that doesn't mean the mtime can't be changed, just that it doesn't happen automatically today.

As an aside, it isn't clear why "stripesize" is still under /proc/fs/lustre and not /sys/fs/lustre with the other stripe* parameters?

The same is true for sysfs files:

# ls -l /sys/fs/lustre/lov/stripe*
total 0
0 -rw-r--r--. 1 root root 4096 Dec 21 00:23 stripecount
0 -rw-r--r--. 1 root root 4096 Dec 21 00:23 stripeoffset
0 -rw-r--r--. 1 root root 4096 Dec 21 00:23 stripetype
# lctl set_param lov.*.stripecount=2
lov.myth-clilov-ffff9799f5c86000.stripecount=2
# ls -l /sys/fs/lustre/lov/stripe*
total 0
0 -rw-r--r--. 1 root root 4096 Dec 21 00:23 stripecount
0 -rw-r--r--. 1 root root 4096 Dec 21 00:23 stripeoffset
0 -rw-r--r--. 1 root root 4096 Dec 21 00:23 stripetype

and also debugfs files:

# ls -l /sys/kernel/debug/lustre/mdc/myth-MDT0000-mdc-ffff9799f5c86000/stats
0 -rw-r--r--. 1 root root 0 Dec 19 14:16 stats
# lctl set_param mdc.*.stats=0
mdc.myth-MDT0000-mdc-ffff9799f5c86000.stats=0
# ls -l /sys/kernel/debug/lustre/mdc/myth-MDT0000-mdc-ffff9799f5c86000/stats
0 -rw-r--r--. 1 root root 0 Dec 19 14:16 stats

However, it does seem possible to change the timestamps of these files:

# touch /proc/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripesize 
# ls -l /proc/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripesize 
0 -rw-r--r--. 1 root root 0 Jan  8 15:12 /proc/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripesize
# touch /sys/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripecount
# ls -l /sys/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripecount
0 -rw-r--r--. 1 root root 4096 Jan  8 15:13 /sys/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripecount
# touch /sys/kernel/debug/lustre/mdc/myth-MDT0000-mdc-ffff9799f5c86000/stats
# ls -l /sys/kernel/debug/lustre/mdc/myth-MDT0000-mdc-ffff9799f5c86000/stats
0 -rw-r--r--. 1 root root 0 Jan  8 15:13 stats

And it appears that there are (as expected) separate timestamps stored for the atime, mtime, and ctime:

# touch -m /proc/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripesize 
# stat /proc/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripesize 
Access: 2024-01-08 15:15:57.938710185 -0700
Modify: 2024-01-08 15:16:28.310231110 -0700
Change: 2024-01-08 15:16:28.310231110 -0700
# touch -a /proc/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripesize 
# stat /proc/fs/lustre/lov/myth-clilov-ffff9799f5c86000/stripesize 
Access: 2024-01-08 15:15:57.938710185 -0700
Modify: 2024-01-08 15:15:21.587283587 -0700
Change: 2024-01-08 15:15:57.938710185 -0700

so it looks like the kernel code that handles setting the parameters should update the mtime+ctime on writes and only the atime on reads.

Generated at Sat Feb 10 03:31:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.