[LU-4800] no automatic module load in newer kernels Created: 21/Mar/14  Updated: 06/Jul/19  Resolved: 16/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3
Fix Version/s: Lustre 2.6.0, Lustre 2.5.3

Type: Bug Priority: Major
Reporter: Bob Glossman (Inactive) Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4416 support for 3.12 linux kernel Resolved
is related to LU-5186 code cleanups in module startups Resolved
is related to LU-12514 separate out the lustre mount code fo... Open
is related to LU-6013 Separate mount helpers for client and... Resolved
Severity: 3
Rank (Obsolete): 13208

 Description   

This problem has been seen in several test environments with newer linux kernel versions than the 2.6.x or 3.0.x we currently support.
For lustre clients built against the unpatched, pristine kernel sources with default .config files I don't get lustre modules autoloading at mount time. example:

# mount -t lustre -o flock,user_xattr centos2:/lustre /mnt/lustre
mount.lustre: mount centos2:/lustre at /mnt/lustre failed: No such device
Are the lustre modules loaded?
Check /etc/modprobe.conf and /proc/filesystems

If I explicitly preload lustre modules with a modprobe command like "modprobe lustre" then the mount works fine.

I got the following commentary in email from Andreas:

This is a problem known to me. If the obdclass module is loaded, then it
will register the "lustre" filesystem type, so it will appear in
/proc/filesystems and mount will not modprobe the "lustre" filesystem
module.

This has been true since Lustre 1.6 or so (when "mountconf" was first
added).

Two options exist:

  • modify the mount.lustre binary to always modprobe the "lustre" module if
    it isn't already loaded
  • change the "lustre" filesystem type registration to be in the "lustre"
    module (as most other filesystems do). That has the problem that the
    servers do not need the "lustre.ko" module loaded, since that is really
    the client VFS interface. It would help if there was a second filesystem
    type "lustre_srv" or similar, that could be used to register server
    mountpoints, and possibly simplify the mount internals (which are
    convoluted because they have to do completely different things for client
    and server mounts).

Obviously, #1 is easier, but #2 would simplify the coe in the long term.



 Comments   
Comment by Bob Glossman (Inactive) [ 21/Mar/14 ]

I plan to push a prototype patch along the lines of Andreas' method #1, using an explicit modprobe in mount.lustre. I'm a bit worried it's not the best solution and may have unexpected side effects like LU-1279 or LU-3948. However it's something I can figure out how to do and it should at least serve to get some discussion going.

Comment by Bob Glossman (Inactive) [ 21/Mar/14 ]

http://review.whamcloud.com/9754

Comment by Oleg Drokin [ 24/Apr/14 ]

I feel that we should aim at the solution #2 as otherwise it's LU-1279 all over again.

Comment by Bob Glossman (Inactive) [ 24/Apr/14 ]

I could figure out how to do method #1 so that's what I prototyped & pushed for review. solution #2 is a bit beyond me.

Comment by James A Simmons [ 24/Apr/14 ]

I agree about doing option #2. Also on cray computes the module tools are removed after initialization of the modules to save memory. So their is no modprobe later on.

Comment by Bob Glossman (Inactive) [ 20/May/14 ]

Since reviewers don't like option #1 as a solution as I proposed in http://review.whamcloud.com/9754 I've been working on an implementation of option #2.

Leveraging the existing manifest constant HAVE_SERVER_SUPPORT I've relocated the fs type registration to llite/super25.c in lustre.ko for client builds while leaving it right where it always was in obdclass/obd_mount.c in obdclass.ko for server builds. This does indeed fix the problem for client builds, forcing the load of lustre,ko during mounts. Other modules get autoloaded properly by dependencies. However this doesn't fix the problem at all for server builds. If I do a client mount, e.g. 'mount -t lustre hostid:/name /mnt/lustre', on an install of server rpms it still fails. obdclass.ko gets loaded, but lustre.ko never does. syslog says

May 20 08:28:28 rhel7rc3 kernel: LustreError: 165-2: Nothing registered for client mount! Is the 'lustre' module loaded?
May 20 08:28:28 rhel7rc3 kernel: LustreError: 4338:0:(obd_mount.c:1337:lustre_fill_super()) Unable to mount  (-19)

I think this is the error expected when lustre.ko isn't loaded. If I do a manual 'modprobe lustre' after seeing this error, then the client mount cmd works fine.

So at the present time I have an implementation that improves matters for client builds but does nothing for server builds. I can push what I have for review, but I don't think it's a production worthy solution. Any suggestions?

Comment by Bob Glossman (Inactive) [ 20/May/14 ]

I have a possible addition that might work. This falls into the category of "harebrained scheme". Nothing to recommend it other than the fact that it might work.

Suppose I put a dummy routine in lustre.ko for server builds, does nothing but is EXPORT_SYMBOL(). Then suppose I call that routine in obd_mount.c just before checking client_fill_super, the local pointer filled in by the lustre.ko init routine. Calling the dummy entry point in lustre.ko should force that module to load. Having loaded, the client_fill_super pointer in obd_mount.c should be set by the time I look at it.

It's still not clear to me why such shenanigans should suddenly be necessary in new kernels, but it could work (I think).

Comment by Robert Read (Inactive) [ 20/May/14 ]

Andreas also suggested changing the server fs name to lustre_srv, which seems like a better solution.

Comment by Bob Glossman (Inactive) [ 20/May/14 ]

I don't see that changing the name is good. For one thing it would mean visible changes to user cmds. Users would have to say 'mount -t lustre_srv' instead of 'mount -t lustre' when mounting OSTs, MDSs, MGS. As a user I wouldn't want to have to know that difference.

Comment by Robert Read (Inactive) [ 20/May/14 ]

Yes, it will definitely be a user visible change, but it really should have been this way from the beginning. It doesn't make any sense that the server and client filesystems are called the same thing - they are two completely different filesystems. (It also doesn't make sense to me that we use mount to start services but I digress... )

I don't know if we have any constraints on when we can do these kinds of changes, so I don't know if this change is appropriate at this time or not, and it's not my call.

Comment by James A Simmons [ 20/May/14 ]

To me mount -t ldiskfs or -t zfs would of been more sane server side. I also agree it is strange to run mount for a service.

Comment by Robert Read (Inactive) [ 20/May/14 ]

It's definitely not a zfs filesytem, but I suppose ldiskfs is whatever we define it to be, so that could work. I think the idea though is we're "mounting" a lustre service, so the underlying disk filesytem doesn't really matter. If that's the case, then mount -t mdt and -t ost would make sense. Perhaps -t lu_target?

Comment by Bob Glossman (Inactive) [ 20/May/14 ]

running mkfs & mount to setup & start services is much more sane than the very old, obscure, and arcane way lustre used to be done before mountconf. Also less mind bending to traditional Unix guys.

Comment by Robert Read (Inactive) [ 20/May/14 ]

Well, I consider myself a pretty traditional Unix guy, and I find it rather mind bending, but I agree what we had before was even more confusing.

Comment by Christopher Morrone [ 20/May/14 ]

I would argue that our current method of using the mount command to start a service has little in common with what a traditional Unix guy would expect. If a Unix guy wants to start an NFS service, and we told him to to "mount -t nfs blah blah" he would reply, "hey, wait, that is what you do on the client side, not on the server side".

There were certainly problems with the commands that Lustre used before it changed to using mount, but we should be careful not to conflate the issue of poor user interface with the issue of whether-or-not to use mount to start services.

I would be in favor of moving to a separate filesystem mount type for Lustre servers, mostly because it moves us in the direction of separating the client and server startup. Eventually that might lead to a well designed user interface that allows Lustre service startup without using the mount command.

How about "lustre_osd".

Comment by Robert Read (Inactive) [ 20/May/14 ]

sure, "lustre_osd" sounds good too.

Comment by James A Simmons [ 21/May/14 ]

Agree we are going off topic. I have no problem with this change. Bike shed moment - could we reverse the order to osd_lustre. I can see admin's pulling their hair not understanding why they can mount anything server side with their scripts

Comment by Bob Glossman (Inactive) [ 04/Jun/14 ]

implementation of option #2
http://review.whamcloud.com/10587

Since nobody liked option #1 I'm abandoning http://review.whamcloud.com/9754

Comment by James A Simmons [ 10/Jun/14 ]

I have been taking a much closer look at how llite handles being loaded since there were a lot of comments against the patch. After looking at the patch I see why. It is a total mess that is totally hard to understand. The last few days I have started to play with the code to see what can be done to be it logical.

Comment by Bob Glossman (Inactive) [ 10/Jun/14 ]

James, if you can manage to make mount execution more sane that will be great. I don't like the way the execution path bounces between code in obd_mount.c in obdclass.ko and code in super25.c in lustre.ko. My solution works, but just adds more ugly to the way it was. I couldn't quite figure out how to make it nice.

Whatever you do it needs to be fairly near term I think. We need a working solution to do client support of rhel7 and sles12.

Comment by Andreas Dilger [ 11/Jun/14 ]

I think it makes sense to start registering a new "lustre_osd" or "lustre_tgt" filesystem type (with the same methods as the "lustre" fstype) in the server code so that it will be available for new installations and as a migration method for existing filesystems. Since this patch will hopefully also be backported to b2_5 and maybe b2_4 then we could update the documentation and eventually test scripts to use this new fstype.

That will give us a path forward to cleaning up the twisty maze that is the shared mount path between clients and servers, even if we don't do it in this release. We would want to keep compatibility with the old "lustre" type for at least through to the next major release anyway so that users are not forced to change their startup scripts in lockstep as they upgrade or downgrade the system.

Comment by Bob Glossman (Inactive) [ 11/Jun/14 ]

Does it make sense to add additional registrations now in this mod? I don't object to the idea in general, but don't want to complicate things right now. Need a solution soon for client support. This became more urgent suddenly since rhel7 officially released yesterday and something is needed just for that alone.

Could additional registrations maybe be done in a follow on patch?

Comment by Andreas Dilger [ 12/Jun/14 ]

I don't mind to put the registration of the"lustre_osd" fstype in a separate patch, but I think it makes sense to land that into 2.6 if at all possible.

Comment by James A Simmons [ 12/Jun/14 ]

Okay I cleaned up the patch. I'm going to push it as a separate patch in case no one likes it. I merged Bob's changes into my changes which allow use to go back to using just one struct file_system_type. The patch is at

http://review.whamcloud.com/#/c/10699

Comment by Bob Glossman (Inactive) [ 12/Jun/14 ]

Looking at both mine & James' mods I'm coming to the conclusion that there's a much simpler solution possible. Now investigating the possibility that the only essential feature is the added runtime request_module("lustre") in obd_mount.c. The improved error exit cleanups in init_lustre_lite are valuable and in fact I think init_obdclass needs similar fixes, but that isn't a necessary part of this mod. Should be followed up in another mod entirely (IMHO). James' symbol_get() calls may also be useful, but on the theory that simplest is best I'm seeing if they are really necessary.

If my latest theory is correct I will push a much simpler mod that does no relocation of the fs regiister/unregister calls at all.

Comment by Bob Glossman (Inactive) [ 12/Jun/14 ]

Preliminary testing seems to confirm my theory. All that fancy handwaving both James and I did look unneeded. Please see my latest iteration at http://review.whamcloud.com/#/c/10587/6

Comment by James A Simmons [ 12/Jun/14 ]

With 2.6 so close to being released I'm happy if that lands. I really like to see the clean Andreas and I did eventually gone in another time. Sorry I got carried away with the idea of cleanup

Comment by Bob Glossman (Inactive) [ 12/Jun/14 ]

I agree about following through on those cleanups. I plan to enter a new ticket for that. James, I will be sure to make you a Watcher there.

Comment by Christopher Morrone [ 12/Jun/14 ]

I really like to see the clean Andreas and I did eventually gone in another time.

It would probably be best to start a new ticket to track that so it does not get lost.

Comment by James A Simmons [ 16/Jun/14 ]

Patch has landed to master. This ticket can be closed.

Comment by Peter Jones [ 16/Jun/14 ]

Landed for 2.6

Generated at Sat Feb 10 01:45:57 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.