[LU-14590] add "dshbak" output aggregation equivalent for "lctl get_param" Created: 08/Apr/21  Updated: 08/Apr/21

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Andreas Dilger Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-14442 lctl get_param '-w' option to dump on... Open
Rank (Obsolete): 9223372036854775807

 Description   

When running "lctl get_param" on a system with hundreds of OSTs and MDTs, and thousands of clients, often there are are many lines of output that are identical for all devices. For example:

# lctl get_param ldlm.namespaces.*.*
ldlm.namespaces.testfs-OST0000-osc-MDT0000.contended_locks=32
ldlm.namespaces.testfs-OST0000-osc-MDT0000.contention_seconds=2
ldlm.namespaces.testfs-OST0000-osc-MDT0000.ctime_age_limit=10
ldlm.namespaces.testfs-OST0000-osc-MDT0000.dirty_age_limit=10
ldlm.namespaces.testfs-OST0000-osc-MDT0000.early_lock_cancel=0
ldlm.namespaces.testfs-OST0000-osc-MDT0000.lock_count=0
ldlm.namespaces.testfs-OST0000-osc-MDT0000.lock_timeouts=0
ldlm.namespaces.testfs-OST0000-osc-MDT0000.lock_unused_count=0
ldlm.namespaces.testfs-OST0000-osc-MDT0000.lru_max_age=3900000
ldlm.namespaces.testfs-OST0000-osc-MDT0000.lru_size=0
ldlm.namespaces.testfs-OST0000-osc-MDT0000.max_nolock_bytes=0
ldlm.namespaces.testfs-OST0000-osc-MDT0000.max_parallel_ast=1024
ldlm.namespaces.testfs-OST0000-osc-MDT0001.contended_locks=32
ldlm.namespaces.testfs-OST0000-osc-MDT0001.contention_seconds=2

:
:
ldlm.namespaces.testfs-OST0102-osc-MDT0013.contended_locks=32
ldlm.namespaces.testfs-OST0102-osc-MDT0013.contention_seconds=2
ldlm.namespaces.testfs-OST0102-osc-MDT0013.ctime_age_limit=10
ldlm.namespaces.testfs-OST0102-osc-MDT0013.dirty_age_limit=10
ldlm.namespaces.testfs-OST0102-osc-MDT0013.early_lock_cancel=0
ldlm.namespaces.testfs-OST0102-osc-MDT0013.lock_count=0
ldlm.namespaces.testfs-OST0102-osc-MDT0013.lock_timeouts=0
ldlm.namespaces.testfs-OST0102-osc-MDT0013.lock_unused_count=0
ldlm.namespaces.testfs-OST0102-osc-MDT0013.lru_max_age=3900000
ldlm.namespaces.testfs-OST0102-osc-MDT0013.lru_size=0
ldlm.namespaces.testfs-OST0102-osc-MDT0013.max_nolock_bytes=0
ldlm.namespaces.testfs-OST0102-osc-MDT0013.max_parallel_ast=1024

but it would be much more convenient if the output was aggregated with wildcards in a manner similar to how multiple parameters are specified with "lctl set_param":

ldlm.namespaces.*.contended_locks=32
ldlm.namespaces.*.contention_seconds=2
ldlm.namespaces.*.ctime_age_limit=10
ldlm.namespaces.*.dirty_age_limit=10
ldlm.namespaces.*.early_lock_cancel=0
ldlm.namespaces.*.lock_count=0
ldlm.namespaces.*.lock_timeouts=0
ldlm.namespaces.*.lock_unused_count=0
ldlm.namespaces.testfs-OST00[00-14]-osc-MDT00[00-13].lru_max_age=3900000
ldlm.namespaces.testfs-OST00[15-63]-osc-MDT00[00-11,13].lru_max_age=3900000
ldlm.namespaces.testfs-OST00[15-63]-osc-MDT0012.lru_max_age=100000
ldlm.namespaces.*.lru_size=0
ldlm.namespaces.*.max_nolock_bytes=0
ldlm.namespaces.*.max_parallel_ast=1024

or something similar (it would be implementation dependent how disjoint regions of identifiers would be shown). This would not only allow reducing the amount of output, but also make it much more obvious in cases where there are differences in the settings (e.g. lru_max_age above).

Something like "lctl merge_param" would take the output of "lctl get_param" as input and merge the lines, either by optionally (with new '-m' option) forking a process to pipe the output of "lctl get_param -m" into, or allowing the output of previously-captured "get_param" output to be aggregated (possibly from dsh or clush running on multiple nodes at once).

There would have to be some implementation-specific smarts in the aggregation, for example to understand that client instance identifiers can be aggregated.


Generated at Sat Feb 10 03:11:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.