Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12387

l_tunedisk does not properly handle multipath devices

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.13.0, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      I've identified two bugs or deficiencies in the l_tunedisk utility in Lustre 2.12/2.13

      The first is a regression introduced by "LU-11736 utils: don't set max_sectors_kb on MDT/MGT".

      The l_tunedisk path does provide sufficient information to the tune_block_dev() function for it to ascertain whether the device being tuned is in-fact an OST. The result is that tuning is not performed for any device. I believe the solution is, after we've determined that the device has been formatted for Lustre, we should read the "lustre_disk_data" (ldd) from the device. This information is stored in the "struct mount_opts" that is passed through to the underlying tuning functions.

      diff --git a/lustre/utils/l_tunedisk.c b/lustre/utils/l_tunedisk.c
      index 3321fd1f0f..feecdeea77 100644
      --- a/lustre/utils/l_tunedisk.c
      +++ b/lustre/utils/l_tunedisk.c
      @@ -45,6 +45,8 @@ int main(int argc, char *const argv[])
              struct mount_opts mop = {
                      .mo_max_sectors_kb = -1
              };
      +       struct lustre_disk_data *ldd = &mop.mo_ldd;
      +
              char real_path[PATH_MAX] = {'\0'};
              unsigned int mount_type;
              static struct option long_opts[] = {
      @@ -95,6 +97,13 @@ int main(int argc, char *const argv[])
              if (ret == 0)
                      goto out;
      
      +       ret = osd_read_ldd(mop.mo_source, ldd);
      +       if (ret != 0) {
      +               fprintf(stderr, "Failed to read previous Lustre data from %s "
      +                       "(%d)\n", mop.mo_source, ret);
      +               goto out;
      +       }
      +
              ret = osd_tune_lustre(mop.mo_source, &mop);
      
       out:
      

      The second issue has to do with how tunings are emplaced for slave devices when the device that was initially passed to l_tunedisk is a multipath device.

      static int tune_block_dev(const char *src, struct mount_opts *mop)
      {
      <snip>
                      /* If device is multipath device then tune its slave
                       * devices. */
                      rc = tune_block_dev_slaves(real_sys_path, mop);
      

      Here "real_sys_path" is something like

      /sys/devices/virtual/block/dm-2
      

      tune_block_dev_slaves() then looks in the slaves subdirectory.

      static int tune_block_dev_slaves(const char *sys_path, struct mount_opts *mop)
      {
      <snip>
              snprintf(slaves_path, sizeof(slaves_path), "%s/slaves", sys_path);
              slaves_dir = opendir(slaves_path);
      

      e.g.

      # ls /sys/devices/virtual/block/dm-2/slaves
      sdc  sdh  sdm  sdr
      #
      

      And it recursively calls tune_block_dev() on each symlink in that directory:

              while ((d = readdir(slaves_dir)) != NULL) {
                      char path[PATH_MAX];
                      int rc2;
      
                      if (d->d_type != DT_LNK)
                              continue;
      
                      snprintf(path, sizeof(path), "%s/%s", slaves_path, d->d_name);
                      rc2 = tune_block_dev(path, mop);
      

      The problem is that tune_block_dev() performs a stat on its "src" argument which is supposed to be the device path that we are tuning. And if the stat does not indicate that the file is a block device then we return without performing any tuning:

      static int tune_block_dev(const char *src, struct mount_opts *mop)
      {
      <snip>
              rc = stat(src, &st);
              if (rc < 0) {
                      if (verbose)
                              fprintf(stderr, "warning: cannot stat '%s': %s\n",
                                      src, strerror(errno));
                      return errno;
              }
      
              if (!S_ISBLK(st.st_mode))
                      return 0;
      

      stat will resolve the symbolic link, however these links point at directories. Directories are not block devices to tune_block_dev() will not tune them.

      I'm not sure what is going to come out of LU-12029.
      That ticket suggests that we might be getting rid of l_tunedisk altogether? If that's the case then this ticket could be closed. But as long as we're providing this utility we may as well fix it up.

      Attachments

        Issue Links

          Activity

            People

              hornc Chris Horn
              hornc Chris Horn
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: