[LU-4810] Mount command unexpectedly sets block device parameter "max_sectors_kb" Created: 24/Mar/14  Updated: 04/Dec/14  Resolved: 06/Nov/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4

Type: Bug Priority: Major
Reporter: Blake Caldwell Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

RHEL 6.4/ distro IB
Kernel 2.6.32-358.23.2.el6.atlas.x86_64
lustre-2.5.0 head


Issue Links:
Related
is related to LU-5888 mount.lustre: set max_sectors_kb to 2... Resolved
Severity: 3
Rank (Obsolete): 13229

 Description   

We've killed our MDT more than once because we were caught off guard by this behavior of the mount command. The array with the MDT devices doesn't support larger than 2kB transfers. The mpt2sas driver sets max_hw_sectors_kb to 16383, and then this code sets max_sectors_kb to 16383 as well. We run into problems when we scan the inodes (reading directly from block device) as part of our purge process.

We would prefer that the command just emit a warning instead of making the change.

http://git.whamcloud.com/?p=fs/lustre-release.git;a=blob;f=lustre/utils/mount_utils_ldiskfs.c;h=dfb9e6cba750a2a5a20e2d28d7814612522ab3a4;hb=HEAD#l1039

        snprintf(real_path, sizeof(real_path), "%s/%s", path,
                 MAX_HW_SECTORS_KB_PATH);
        rc = read_file(real_path, buf, sizeof(buf));
        if (rc) {
                if (verbose)
                        fprintf(stderr, "warning: opening %s: %s\n",
                                real_path, strerror(errno));
                /* No MAX_HW_SECTORS_KB_PATH isn't necessary an
                 * error for some device. */
                rc = 0;
        }

        if (strlen(buf) - 1 > 0) {
                snprintf(real_path, sizeof(real_path), "%s/%s", path,
                         MAX_SECTORS_KB_PATH);
                rc = write_file(real_path, buf);
                if (rc) {
                        if (verbose)
                                fprintf(stderr, "warning: writing to %s: %s\n",
                                        real_path, strerror(errno));
                        /* No MAX_SECTORS_KB_PATH isn't necessary an
                         * error for some device. */
                        rc = 0;
                }
        }


 Comments   
Comment by Peter Jones [ 24/Mar/14 ]

Niu

Could you please look into this issue?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 25/Mar/14 ]

Hi, do you know why mpt2sas sets the max_hw_sector_kb to 16383? Is this a mpt2sas defect? Thanks.

Comment by Matt Ezell [ 25/Mar/14 ]

I don't think the driver does anything to probe the actual device about its maximum available size, it just uses some default. That should be changeable with a module parameter:

# modinfo mpt2sas | grep max_sectors
parm:           max_sectors:max sectors, range 64 to 32767  default=32767 (ushort)

https://github.com/torvalds/linux/blob/master/drivers/scsi/mpt2sas/mpt2sas_scsih.c#L102

Unfortunately, this setting is driver-wide. So if you have multiple SAS-connected devices, you have to use the "largest" value for the module parameter. But then Lustre just overwrites any settings you pick for max_sectors_kb. Right now our MDSs only have a single type of SAS device.

Comment by Blake Caldwell [ 25/Mar/14 ]

Setting the mpt2sas module parameter solves our problem. Thanks Matt!

However, I would propose that this be logged to dmesg when the mount command changes the block device tunables.

Comment by Niu Yawei (Inactive) [ 02/Apr/14 ]

http://review.whamcloud.com/9865

Comment by James Nunez (Inactive) [ 06/Nov/14 ]

Patch for b2_5 is at http://review.whamcloud.com/#/c/12553/2

Comment by James Nunez (Inactive) [ 06/Nov/14 ]

Landed to master (pre-2.7)

Comment by Andreas Dilger [ 14/Nov/14 ]

I needed to fix up the 9865 patch because it was printing messages about setting max_sectors_kb for every mount, along with an annoying linefeed, even if this tunable was not being changed. Note sda is being set repeatedly even though it couldn't possibly need it. In fact, all of these devices have max_sectors_kb=128 already and don't need any tuning:

Setup mgs, mdt, osts
Starting mgs:   /dev/vg_sookie/lvmgs /mnt/mgs
mount.lustre: set /sys/block/dm-18/queue/max_sectors_kb to 128

mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128

Started MGS
Starting mds1:   /dev/vg_sookie/lvmdt1 /mnt/mds1
mount.lustre: set /sys/block/dm-12/queue/max_sectors_kb to 128

mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128

Started testfs-MDT0000
Starting mds2:   /dev/vg_sookie/lvmdt2 /mnt/mds2
mount.lustre: set /sys/block/dm-13/queue/max_sectors_kb to 128

mount.lustre: set /sys/block/sda/queue/max_sectors_kb to 128
Started testfs-MDT0001

My patch in LU-5888 http://review.whamcloud.com/12723 fixes this to only print the message for devices that are actually being changed (devices manually set to have max_sectors_kb=64 for testing purposes):

Setup mgs, mdt, osts
Starting mgs:   /dev/vg_sookie/lvmgs /mnt/mgs
mount.lustre: increased /sys/block/dm-18/queue/max_sectors_kb from 64 to 128
mount.lustre: increased /sys/block/sda/queue/max_sectors_kb from 64 to 128
Started MGS
Starting mds1:   /dev/vg_sookie/lvmdt1 /mnt/mds1
mount.lustre: increased /sys/block/dm-12/queue/max_sectors_kb from 64 to 128
Started testfs-MDT0000
Starting mds2:   /dev/vg_sookie/lvmdt2 /mnt/mds2
Started testfs-MDT0001

If the 12553 patch is landed to b2_5, the 12723 patch should also be backported to b2_5 or these messages will be equally annoying there.

Comment by Gerrit Updater [ 04/Dec/14 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/12553/
Subject: LU-4810 utils: print messages when set tunables
Project: fs/lustre-release
Branch: b2_5
Current Patch Set:
Commit: ef5c7bb4a638d50628e91af0c19eb10aab6ec1f5

Generated at Sat Feb 10 01:46:02 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.