[LU-12530] udev add/change rule loads zfs module on clients Created: 09/Jul/19  Updated: 14/Jul/20  Resolved: 12/Nov/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.5, Lustre 2.12.2
Fix Version/s: Lustre 2.14.0, Lustre 2.12.4

Type: Bug Priority: Minor
Reporter: Olaf Faaland Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-9551 I/O errors when lustre uses multipath... Resolved
is related to LU-11736 Do not apply bulk IO tuning on MDT or... Resolved
is related to LU-10599 Possible for ID_FS_TYPE to be lustre ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Lustre loads the zfs module whenever a block device is added or changed, if Lustre was built against zfs. It unintentionally does this on clients as well as servers.

Commit 0d11a314787bc795797a016262e9bcfe86e2193e for LU-11563 added a udev rule,
lustre/conf/99-lustre-server.rules

When a block device is added or changed:
udev -> l_tunedisk
l_tunedisk -> mount_utils.c:osd_init()
osd_init() > backfs_ops[]>init() AKA zfs_init()
zfs_init() -> "/sbin/modprobe -q zfs"

This occurs if the Lustre rpms installed are a full build of Lustre, instead of just a client build.



 Comments   
Comment by Olaf Faaland [ 09/Jul/19 ]

For our tracking, our local ticket is TOSS-4559

Comment by Peter Jones [ 10/Jul/19 ]

Nathaniel

Can you please assist with this one?

Thanks

Peter

Comment by James A Simmons [ 10/Jul/19 ]

You have ZFS installed on your clients? Interesting setup

Comment by Olaf Faaland [ 10/Jul/19 ]

> You have ZFS installed on your clients? Interesting setup

Yep, it's the kitchen sink approach. But of course we don't intend to actually load the kernel module .

Comment by Nathaniel Clark [ 11/Jul/19 ]

ofaaland,

Are you using lustre-client or lustre rpms on your clients?  If you are using the lustre rpms, then yes, the modules will be loaded when a disk add is triggered in udev via that rule.

https://review.whamcloud.com/33466/ split out that udev rule to be server only so it should only affect your clients if they are using lustre instead of lustre-client.

Comment by Olaf Faaland [ 20/Jul/19 ]

Nathaniel,

Correct, It's lustre that is installed, not lustre-client, and yes I'm aware of that commit - see the Description . I'd read the ticket about the kernel crashes due to differing device tunables, and I understand why the rule was created. But there are the reasons I think this is incorrect behavior. Maybe I'm mistaken.

  1. The "lustre" rpm has both server and client bits. It's not just server bits, and therefore not an indication, by itself, that the system on which it's installed is a lustre server - just that it might be.
  2. The rule seems overly broad. For example, blkid can tell us if the device is ldiskfs or zfs; if neither then no point in running tunedisk(). Can we use that?
  3. zfs_tune_lustre() doesn't do anything anyway, so no benefit to loading the zfs kernel module for that
  4. From what I read this seems to be working around a kernel bug, which exists only in some kernels and isn't specific to Lustre - so it seems like the wrong place for the fix. I guess some Lustre users are stuck (ie their distro didn't fix the issue or they can't upgrade, etc.) but then I think this should be available to them but not enabled by default.

What do you think?

Comment by Nathaniel Clark [ 24/Jul/19 ]

> 2. The rule seems overly broad.

I agree the rule is probably overly broad, and we can use udev itself to limit it more than it is now.

> 3. zfs_tune_lustre() doesn't do anything

But it might someday...

> 4. From what I read this seems to be working around a kernel bug

It's not so much a kernel bug as kernel behavior.  There's a possible fix for this issue in device-mapper-multipath, but only in fairly recent releases (maybe only el7.6?) (the max_sectors_kb setting, but even that needs to be tuned per device, and I'm not positive it picks up the case that LU-11563 is fixing, I only assume it does).

 

Yes, I think making the udev rule more narrowly tailored would be a good thing.

Comment by Olaf Faaland [ 25/Oct/19 ]

>> 3. zfs_tune_lustre() doesn't do anything
>
> But it might someday...

That's fair, but it shouldn't do anything with the block devices in the pool in any case. If tunables need to be set at that level, that should be handled within ZFS so that non-lustre ZFS users get the benefit as well. ZFS already has to deal properly with block devices appearing and disappearing. For example, ZFS currently sets the elevator to "noop" for the block devices in the pool.

zpool and dataset properties are persistent, so they don't need this treatment. One could also set module parameters upon module load, but this isn't the place for that.

So I think this rule should match on

ENV{ID_FS_TYPE}=="ext4"

in addition to the ACTION and SUBSYSTEM already specified. That doesn't actually solve my problem, but I think it's a step in the right direction.

Comment by Olaf Faaland [ 25/Oct/19 ]

I don't see another udev property that would help distinguish ext4-formatted devices from ldiskfs-formatted devices, but if you know of one that would be even better.

Comment by Nathaniel Clark [ 28/Oct/19 ]

For ldiskfs type, I think, the closest you can get is checking the FS_LABEL, that it matches one of these:

MGS
*-MDT*
*-OST*

If it doesn't match one of those, then it definitely isn't ldiskfs.

Comment by James A Simmons [ 28/Oct/19 ]

One day ext4 will be what is ldiskfs today. So Nathaniel approach is the best idea.

Comment by Gerrit Updater [ 28/Oct/19 ]

Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/36599
Subject: LU-12530 utils: narrow l_tunedisk udev rule
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: daa0f160640a6a3e5581e1f10ad33047a8a44ae6

Comment by Olaf Faaland [ 30/Oct/19 ]

After this lands, please consider backporting to b2_12. Thanks.

Comment by Gerrit Updater [ 12/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36599/
Subject: LU-12530 utils: narrow l_tunedisk udev rule
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7b2cb54858daa60d560fd6374c4ecba552a10d27

Comment by Peter Jones [ 12/Nov/19 ]

Landed for 2.14

Comment by Gerrit Updater [ 18/Nov/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36776
Subject: LU-12530 utils: narrow l_tunedisk udev rule
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: f5f3f9388aa9fa2c8eb7f2e83b3eb1912afea1fd

Comment by Gerrit Updater [ 05/Dec/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36776/
Subject: LU-12530 utils: narrow l_tunedisk udev rule
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: e365cf2b001af7ca0883d293b7c148bcaa8e1c61

Generated at Sat Feb 10 02:53:24 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.