Details
-
Bug
-
Resolution: Fixed
-
Medium
-
Lustre 2.17.0, Lustre 2.18.0
-
3
-
9223372036854775807
Description
The new lctl find_param parameter added in patch https://review.whamcloud.com/57675 ("LU-18418 utils: new lctl find_param option") appears to be opening and reading every parameter, even if they do not match the parameter name being requested. The find_param sub-command is only looking at parameter names and not any values, as can be seen from using read_bytes as an argument, which is definitely present in many stats parameters:
# lctl find_param read_bytes #
Searching for the memused parameter returns 3 entries, all of which can be accessed directly from memory, but this can take a long time to complete:
# lctl find_param memused memused=508001847 memused_max=513329167 lnet_memused=99452430
Running under strace shows that the code is actually opening and reading the values from every parameter in the filesystem, including some that trigger disk reads and are very large, for no reason at all:
# strace lctl find_param memused |& less
:
lstat("/sys/kernel/debug/lustre/mgs/MGS/osd/brw_stats", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/kernel/debug/lustre/mgs/MGS/osd/brw_stats", O_RDONLY) = 3
read(3, "snapshot_time: 177438"..., 4096) = 1050
read(3, "", 4096) = 0
close(3) = 0
lstat("/sys/kernel/debug/lustre/mgs/MGS/osd/io_latency_stats", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/kernel/debug/lustre/mgs/MGS/osd/io_latency_stats", O_RDONLY) = 3
read(3, "io_latency_by_size:\nsnapshot_tim"..., 4096) = 127
read(3, "", 4096) = 0
close(3) = 0
lstat("/sys/kernel/debug/lustre/mgs/MGS/osd/oi_scrub", {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/kernel/debug/lustre/mgs/MGS/osd/oi_scrub", O_RDONLY) = 3
read(3, "name: OI_scrub\nmagic: 0x4c5fe253"..., 4096) = 516
read(3, "", 4096) = 0
close(3) = 0
lstat("/sys/kernel/debug/lustre/mgs/MGS/osd/stats", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
openat(AT_FDCWD, "/sys/kernel/debug/lustre/mgs/MGS/osd/stats", O_RDONLY) = 3
read(3, "snapshot_time 177438"..., 4096) = 170
read(3, "", 4096) = 0
close(3) = 0
:
The code should definitely only be reading parameters whose name matches the requested argument, and skipping all of the others. That should speed up the command significantly, and reduce overhead/impact on a running system, as well as avoid other side effects (e.g. reading from job_stats clears old job statistics, and reading from kbytesfree, filesfree, etc. triggers OBD_STATFS RPCs to be sent to all of the targets).
Attachments
Issue Links
- is related to
-
LU-18418 add "lctl find_param" option to search for a parameter name
-
- Resolved
-