Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.13.0, Lustre 2.12.3
-
None
-
3
-
9223372036854775807
Description
I've identified two bugs or deficiencies in the l_tunedisk utility in Lustre 2.12/2.13
The first is a regression introduced by "LU-11736 utils: don't set max_sectors_kb on MDT/MGT".
The l_tunedisk path does provide sufficient information to the tune_block_dev() function for it to ascertain whether the device being tuned is in-fact an OST. The result is that tuning is not performed for any device. I believe the solution is, after we've determined that the device has been formatted for Lustre, we should read the "lustre_disk_data" (ldd) from the device. This information is stored in the "struct mount_opts" that is passed through to the underlying tuning functions.
diff --git a/lustre/utils/l_tunedisk.c b/lustre/utils/l_tunedisk.c index 3321fd1f0f..feecdeea77 100644 --- a/lustre/utils/l_tunedisk.c +++ b/lustre/utils/l_tunedisk.c @@ -45,6 +45,8 @@ int main(int argc, char *const argv[]) struct mount_opts mop = { .mo_max_sectors_kb = -1 }; + struct lustre_disk_data *ldd = &mop.mo_ldd; + char real_path[PATH_MAX] = {'\0'}; unsigned int mount_type; static struct option long_opts[] = { @@ -95,6 +97,13 @@ int main(int argc, char *const argv[]) if (ret == 0) goto out; + ret = osd_read_ldd(mop.mo_source, ldd); + if (ret != 0) { + fprintf(stderr, "Failed to read previous Lustre data from %s " + "(%d)\n", mop.mo_source, ret); + goto out; + } + ret = osd_tune_lustre(mop.mo_source, &mop); out:
The second issue has to do with how tunings are emplaced for slave devices when the device that was initially passed to l_tunedisk is a multipath device.
static int tune_block_dev(const char *src, struct mount_opts *mop) { <snip> /* If device is multipath device then tune its slave * devices. */ rc = tune_block_dev_slaves(real_sys_path, mop);
Here "real_sys_path" is something like
/sys/devices/virtual/block/dm-2
tune_block_dev_slaves() then looks in the slaves subdirectory.
static int tune_block_dev_slaves(const char *sys_path, struct mount_opts *mop) { <snip> snprintf(slaves_path, sizeof(slaves_path), "%s/slaves", sys_path); slaves_dir = opendir(slaves_path);
e.g.
# ls /sys/devices/virtual/block/dm-2/slaves sdc sdh sdm sdr #
And it recursively calls tune_block_dev() on each symlink in that directory:
while ((d = readdir(slaves_dir)) != NULL) { char path[PATH_MAX]; int rc2; if (d->d_type != DT_LNK) continue; snprintf(path, sizeof(path), "%s/%s", slaves_path, d->d_name); rc2 = tune_block_dev(path, mop);
The problem is that tune_block_dev() performs a stat on its "src" argument which is supposed to be the device path that we are tuning. And if the stat does not indicate that the file is a block device then we return without performing any tuning:
static int tune_block_dev(const char *src, struct mount_opts *mop) { <snip> rc = stat(src, &st); if (rc < 0) { if (verbose) fprintf(stderr, "warning: cannot stat '%s': %s\n", src, strerror(errno)); return errno; } if (!S_ISBLK(st.st_mode)) return 0;
stat will resolve the symbolic link, however these links point at directories. Directories are not block devices to tune_block_dev() will not tune them.
I'm not sure what is going to come out of LU-12029.
That ticket suggests that we might be getting rid of l_tunedisk altogether? If that's the case then this ticket could be closed. But as long as we're providing this utility we may as well fix it up.