Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12530

udev add/change rule loads zfs module on clients

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0, Lustre 2.12.4
    • Lustre 2.10.5, Lustre 2.12.2
    • None
    • 3
    • 9223372036854775807

    Description

      Lustre loads the zfs module whenever a block device is added or changed, if Lustre was built against zfs. It unintentionally does this on clients as well as servers.

      Commit 0d11a314787bc795797a016262e9bcfe86e2193e for LU-11563 added a udev rule,
      lustre/conf/99-lustre-server.rules

      When a block device is added or changed:
      udev -> l_tunedisk
      l_tunedisk -> mount_utils.c:osd_init()
      osd_init() > backfs_ops[]>init() AKA zfs_init()
      zfs_init() -> "/sbin/modprobe -q zfs"

      This occurs if the Lustre rpms installed are a full build of Lustre, instead of just a client build.

      Attachments

        Issue Links

          Activity

            [LU-12530] udev add/change rule loads zfs module on clients
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36599/
            Subject: LU-12530 utils: narrow l_tunedisk udev rule
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7b2cb54858daa60d560fd6374c4ecba552a10d27

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36599/ Subject: LU-12530 utils: narrow l_tunedisk udev rule Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7b2cb54858daa60d560fd6374c4ecba552a10d27

            After this lands, please consider backporting to b2_12. Thanks.

            ofaaland Olaf Faaland added a comment - After this lands, please consider backporting to b2_12. Thanks.

            Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/36599
            Subject: LU-12530 utils: narrow l_tunedisk udev rule
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: daa0f160640a6a3e5581e1f10ad33047a8a44ae6

            gerrit Gerrit Updater added a comment - Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/36599 Subject: LU-12530 utils: narrow l_tunedisk udev rule Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: daa0f160640a6a3e5581e1f10ad33047a8a44ae6

            One day ext4 will be what is ldiskfs today. So Nathaniel approach is the best idea.

            simmonsja James A Simmons added a comment - One day ext4 will be what is ldiskfs today. So Nathaniel approach is the best idea.
            utopiabound Nathaniel Clark added a comment - - edited

            For ldiskfs type, I think, the closest you can get is checking the FS_LABEL, that it matches one of these:

            MGS
            *-MDT*
            *-OST*
            

            If it doesn't match one of those, then it definitely isn't ldiskfs.

            utopiabound Nathaniel Clark added a comment - - edited For ldiskfs type, I think, the closest you can get is checking the FS_LABEL, that it matches one of these: MGS *-MDT* *-OST* If it doesn't match one of those, then it definitely isn't ldiskfs.
            ofaaland Olaf Faaland added a comment -

            I don't see another udev property that would help distinguish ext4-formatted devices from ldiskfs-formatted devices, but if you know of one that would be even better.

            ofaaland Olaf Faaland added a comment - I don't see another udev property that would help distinguish ext4-formatted devices from ldiskfs-formatted devices, but if you know of one that would be even better.
            ofaaland Olaf Faaland added a comment - - edited

            >> 3. zfs_tune_lustre() doesn't do anything
            >
            > But it might someday...

            That's fair, but it shouldn't do anything with the block devices in the pool in any case. If tunables need to be set at that level, that should be handled within ZFS so that non-lustre ZFS users get the benefit as well. ZFS already has to deal properly with block devices appearing and disappearing. For example, ZFS currently sets the elevator to "noop" for the block devices in the pool.

            zpool and dataset properties are persistent, so they don't need this treatment. One could also set module parameters upon module load, but this isn't the place for that.

            So I think this rule should match on

            ENV{ID_FS_TYPE}=="ext4"
            

            in addition to the ACTION and SUBSYSTEM already specified. That doesn't actually solve my problem, but I think it's a step in the right direction.

            ofaaland Olaf Faaland added a comment - - edited >> 3. zfs_tune_lustre() doesn't do anything > > But it might someday... That's fair, but it shouldn't do anything with the block devices in the pool in any case. If tunables need to be set at that level, that should be handled within ZFS so that non-lustre ZFS users get the benefit as well. ZFS already has to deal properly with block devices appearing and disappearing. For example, ZFS currently sets the elevator to "noop" for the block devices in the pool. zpool and dataset properties are persistent, so they don't need this treatment. One could also set module parameters upon module load, but this isn't the place for that. So I think this rule should match on ENV{ID_FS_TYPE}=="ext4" in addition to the ACTION and SUBSYSTEM already specified. That doesn't actually solve my problem, but I think it's a step in the right direction.

            > 2. The rule seems overly broad.

            I agree the rule is probably overly broad, and we can use udev itself to limit it more than it is now.

            > 3. zfs_tune_lustre() doesn't do anything

            But it might someday...

            > 4. From what I read this seems to be working around a kernel bug

            It's not so much a kernel bug as kernel behavior.  There's a possible fix for this issue in device-mapper-multipath, but only in fairly recent releases (maybe only el7.6?) (the max_sectors_kb setting, but even that needs to be tuned per device, and I'm not positive it picks up the case that LU-11563 is fixing, I only assume it does).

             

            Yes, I think making the udev rule more narrowly tailored would be a good thing.

            utopiabound Nathaniel Clark added a comment - > 2. The rule seems overly broad. I agree the rule is probably overly broad, and we can use udev itself to limit it more than it is now. > 3. zfs_tune_lustre() doesn't do anything But it might someday... > 4. From what I read this seems to be working around a kernel bug It's not so much a kernel bug as kernel behavior.  There's a possible fix for this issue in device-mapper-multipath, but only in fairly recent releases (maybe only el7.6?) (the max_sectors_kb setting, but even that needs to be tuned per device, and I'm not positive it picks up the case that LU-11563 is fixing, I only assume it does).   Yes, I think making the udev rule more narrowly tailored would be a good thing.

            Nathaniel,

            Correct, It's lustre that is installed, not lustre-client, and yes I'm aware of that commit - see the Description . I'd read the ticket about the kernel crashes due to differing device tunables, and I understand why the rule was created. But there are the reasons I think this is incorrect behavior. Maybe I'm mistaken.

            1. The "lustre" rpm has both server and client bits. It's not just server bits, and therefore not an indication, by itself, that the system on which it's installed is a lustre server - just that it might be.
            2. The rule seems overly broad. For example, blkid can tell us if the device is ldiskfs or zfs; if neither then no point in running tunedisk(). Can we use that?
            3. zfs_tune_lustre() doesn't do anything anyway, so no benefit to loading the zfs kernel module for that
            4. From what I read this seems to be working around a kernel bug, which exists only in some kernels and isn't specific to Lustre - so it seems like the wrong place for the fix. I guess some Lustre users are stuck (ie their distro didn't fix the issue or they can't upgrade, etc.) but then I think this should be available to them but not enabled by default.

            What do you think?

            ofaaland Olaf Faaland added a comment - Nathaniel, Correct, It's lustre that is installed, not lustre-client, and yes I'm aware of that commit - see the Description . I'd read the ticket about the kernel crashes due to differing device tunables, and I understand why the rule was created. But there are the reasons I think this is incorrect behavior. Maybe I'm mistaken. The "lustre" rpm has both server and client bits. It's not just server bits, and therefore not an indication, by itself, that the system on which it's installed is a lustre server - just that it might be. The rule seems overly broad. For example, blkid can tell us if the device is ldiskfs or zfs; if neither then no point in running tunedisk(). Can we use that? zfs_tune_lustre() doesn't do anything anyway, so no benefit to loading the zfs kernel module for that From what I read this seems to be working around a kernel bug, which exists only in some kernels and isn't specific to Lustre - so it seems like the wrong place for the fix. I guess some Lustre users are stuck (ie their distro didn't fix the issue or they can't upgrade, etc.) but then I think this should be available to them but not enabled by default. What do you think?

            ofaaland,

            Are you using lustre-client or lustre rpms on your clients?  If you are using the lustre rpms, then yes, the modules will be loaded when a disk add is triggered in udev via that rule.

            https://review.whamcloud.com/33466/ split out that udev rule to be server only so it should only affect your clients if they are using lustre instead of lustre-client.

            utopiabound Nathaniel Clark added a comment - ofaaland , Are you using lustre-client or lustre rpms on your clients?  If you are using the lustre rpms, then yes, the modules will be loaded when a disk add is triggered in udev via that rule. https://review.whamcloud.com/33466/ split out that udev rule to be server only so it should only affect your clients if they are using lustre instead of lustre-client.

            People

              utopiabound Nathaniel Clark
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: