Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4810

Mount command unexpectedly sets block device parameter "max_sectors_kb"

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.7.0, Lustre 2.5.4
    • Lustre 2.5.0
    • None
    • RHEL 6.4/ distro IB
      Kernel 2.6.32-358.23.2.el6.atlas.x86_64
      lustre-2.5.0 head
    • 3
    • 13229

    Description

      We've killed our MDT more than once because we were caught off guard by this behavior of the mount command. The array with the MDT devices doesn't support larger than 2kB transfers. The mpt2sas driver sets max_hw_sectors_kb to 16383, and then this code sets max_sectors_kb to 16383 as well. We run into problems when we scan the inodes (reading directly from block device) as part of our purge process.

      We would prefer that the command just emit a warning instead of making the change.

      http://git.whamcloud.com/?p=fs/lustre-release.git;a=blob;f=lustre/utils/mount_utils_ldiskfs.c;h=dfb9e6cba750a2a5a20e2d28d7814612522ab3a4;hb=HEAD#l1039

              snprintf(real_path, sizeof(real_path), "%s/%s", path,
                       MAX_HW_SECTORS_KB_PATH);
              rc = read_file(real_path, buf, sizeof(buf));
              if (rc) {
                      if (verbose)
                              fprintf(stderr, "warning: opening %s: %s\n",
                                      real_path, strerror(errno));
                      /* No MAX_HW_SECTORS_KB_PATH isn't necessary an
                       * error for some device. */
                      rc = 0;
              }
      
              if (strlen(buf) - 1 > 0) {
                      snprintf(real_path, sizeof(real_path), "%s/%s", path,
                               MAX_SECTORS_KB_PATH);
                      rc = write_file(real_path, buf);
                      if (rc) {
                              if (verbose)
                                      fprintf(stderr, "warning: writing to %s: %s\n",
                                              real_path, strerror(errno));
                              /* No MAX_SECTORS_KB_PATH isn't necessary an
                               * error for some device. */
                              rc = 0;
                      }
              }
      

      Attachments

        Issue Links

          Activity

            [LU-4810] Mount command unexpectedly sets block device parameter "max_sectors_kb"

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12553/
            Subject: LU-4810 utils: print messages when set tunables
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set:
            Commit: ef5c7bb4a638d50628e91af0c19eb10aab6ec1f5

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12553/ Subject: LU-4810 utils: print messages when set tunables Project: fs/lustre-release Branch: b2_5 Current Patch Set: Commit: ef5c7bb4a638d50628e91af0c19eb10aab6ec1f5
            adilger Andreas Dilger added a comment - - edited

            I needed to fix up the 9865 patch because it was printing messages about setting max_sectors_kb for every mount, along with an annoying linefeed, even if this tunable was not being changed. Note sda is being set repeatedly even though it couldn't possibly need it. In fact, all of these devices have max_sectors_kb=128 already and don't need any tuning:

            Setup mgs, mdt, osts
            Starting mgs:   /dev/vg_sookie/lvmgs /mnt/mgs
            mount.lustre: set /sys/block/dm-18/queue/max_sectors_kb to 128
            
            mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128
            
            Started MGS
            Starting mds1:   /dev/vg_sookie/lvmdt1 /mnt/mds1
            mount.lustre: set /sys/block/dm-12/queue/max_sectors_kb to 128
            
            mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128
            
            Started testfs-MDT0000
            Starting mds2:   /dev/vg_sookie/lvmdt2 /mnt/mds2
            mount.lustre: set /sys/block/dm-13/queue/max_sectors_kb to 128
            
            mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128
            Started testfs-MDT0001
            

            My patch in LU-5888 http://review.whamcloud.com/12723 fixes this to only print the message for devices that are actually being changed (devices manually set to have max_sectors_kb=64 for testing purposes):

            Setup mgs, mdt, osts
            Starting mgs:   /dev/vg_sookie/lvmgs /mnt/mgs
            mount.lustre: increased /sys/block/dm-18/queue/max_sectors_kb from 64 to 128
            mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 64 to 128
            Started MGS
            Starting mds1:   /dev/vg_sookie/lvmdt1 /mnt/mds1
            mount.lustre: increased /sys/block/dm-12/queue/max_sectors_kb from 64 to 128
            Started testfs-MDT0000
            Starting mds2:   /dev/vg_sookie/lvmdt2 /mnt/mds2
            Started testfs-MDT0001
            

            If the 12553 patch is landed to b2_5, the 12723 patch should also be backported to b2_5 or these messages will be equally annoying there.

            adilger Andreas Dilger added a comment - - edited I needed to fix up the 9865 patch because it was printing messages about setting max_sectors_kb for every mount, along with an annoying linefeed, even if this tunable was not being changed. Note sda is being set repeatedly even though it couldn't possibly need it. In fact, all of these devices have max_sectors_kb=128 already and don't need any tuning: Setup mgs, mdt, osts Starting mgs: /dev/vg_sookie/lvmgs /mnt/mgs mount.lustre: set /sys/block/dm-18/queue/max_sectors_kb to 128 mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128 Started MGS Starting mds1: /dev/vg_sookie/lvmdt1 /mnt/mds1 mount.lustre: set /sys/block/dm-12/queue/max_sectors_kb to 128 mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128 Started testfs-MDT0000 Starting mds2: /dev/vg_sookie/lvmdt2 /mnt/mds2 mount.lustre: set /sys/block/dm-13/queue/max_sectors_kb to 128 mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128 Started testfs-MDT0001 My patch in LU-5888 http://review.whamcloud.com/12723 fixes this to only print the message for devices that are actually being changed (devices manually set to have max_sectors_kb=64 for testing purposes): Setup mgs, mdt, osts Starting mgs: /dev/vg_sookie/lvmgs /mnt/mgs mount.lustre: increased /sys/block/dm-18/queue/max_sectors_kb from 64 to 128 mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 64 to 128 Started MGS Starting mds1: /dev/vg_sookie/lvmdt1 /mnt/mds1 mount.lustre: increased /sys/block/dm-12/queue/max_sectors_kb from 64 to 128 Started testfs-MDT0000 Starting mds2: /dev/vg_sookie/lvmdt2 /mnt/mds2 Started testfs-MDT0001 If the 12553 patch is landed to b2_5, the 12723 patch should also be backported to b2_5 or these messages will be equally annoying there.

            Landed to master (pre-2.7)

            jamesanunez James Nunez (Inactive) added a comment - Landed to master (pre-2.7)
            jamesanunez James Nunez (Inactive) added a comment - Patch for b2_5 is at http://review.whamcloud.com/#/c/12553/2
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/9865

            Setting the mpt2sas module parameter solves our problem. Thanks Matt!

            However, I would propose that this be logged to dmesg when the mount command changes the block device tunables.

            blakecaldwell Blake Caldwell added a comment - Setting the mpt2sas module parameter solves our problem. Thanks Matt! However, I would propose that this be logged to dmesg when the mount command changes the block device tunables.
            ezell Matt Ezell added a comment -

            I don't think the driver does anything to probe the actual device about its maximum available size, it just uses some default. That should be changeable with a module parameter:

            # modinfo mpt2sas | grep max_sectors
            parm:           max_sectors:max sectors, range 64 to 32767  default=32767 (ushort)
            

            https://github.com/torvalds/linux/blob/master/drivers/scsi/mpt2sas/mpt2sas_scsih.c#L102

            Unfortunately, this setting is driver-wide. So if you have multiple SAS-connected devices, you have to use the "largest" value for the module parameter. But then Lustre just overwrites any settings you pick for max_sectors_kb. Right now our MDSs only have a single type of SAS device.

            ezell Matt Ezell added a comment - I don't think the driver does anything to probe the actual device about its maximum available size, it just uses some default. That should be changeable with a module parameter: # modinfo mpt2sas | grep max_sectors parm: max_sectors:max sectors, range 64 to 32767 default=32767 (ushort) https://github.com/torvalds/linux/blob/master/drivers/scsi/mpt2sas/mpt2sas_scsih.c#L102 Unfortunately, this setting is driver-wide. So if you have multiple SAS-connected devices, you have to use the "largest" value for the module parameter. But then Lustre just overwrites any settings you pick for max_sectors_kb. Right now our MDSs only have a single type of SAS device.

            Hi, do you know why mpt2sas sets the max_hw_sector_kb to 16383? Is this a mpt2sas defect? Thanks.

            niu Niu Yawei (Inactive) added a comment - Hi, do you know why mpt2sas sets the max_hw_sector_kb to 16383? Is this a mpt2sas defect? Thanks.
            pjones Peter Jones added a comment -

            Niu

            Could you please look into this issue?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please look into this issue? Thanks Peter

            People

              niu Niu Yawei (Inactive)
              blakecaldwell Blake Caldwell
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: