[LU-6013] Separate mount helpers for client and server Created: 10/Dec/14  Updated: 26/Feb/21  Resolved: 18/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.5.3
Fix Version/s: Lustre 2.8.0

Type: Improvement Priority: Minor
Reporter: Olaf Faaland Assignee: Olaf Faaland
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-5851 Handle LU-4606 packaging changes for ... Resolved
is related to LU-4800 no automatic module load in newer ker... Resolved
is related to LU-12514 separate out the lustre mount code fo... Open
Rank (Obsolete): 16757

 Description   

Currently a single mount helper, mount.lustre, is used to mount lustre on a client, as well as to start a lustre server (e.g. OST).

This makes it harder to see bugs in which client-side mount code is executed in a server context or vice versa, and cause undesirable side-affects.

An example seen at LLNL recently was with osd_init(). Client nodes could not mount a lustre filesystem because mount called osd_init(), which attempted to load the mount_osd_{zfs,ldiskfs} backfs modules. One of the modules failed to load, causing the mount to fail, even though the client does not use those modules at all.

That specific bug is being corrected by change http://review.whamcloud.com/#/c/12550/ for LU-5851, but there may well be other similar issues that have not yet surfaced; and creating more is not difficult with the current code structure.

The proposal here is to move server-side code from mount_lustre.c to mount_lustre_server.c, modify the build system to generate two separate binaries, mount.lustre and mount.lustre_server, and then update startup scripts, spec file, and documentation appropriately.



 Comments   
Comment by Andreas Dilger [ 10/Dec/14 ]

This would also require that all Lustre server filesystems change their type from "lustre" to "lustre_server", and use "mount -t lustre_server" everywhere. That is a pretty impactful change for only potential future problems, since it will cause all existing filesystems to fail mount on upgrade or downgrade, and invalidate all existing documentation/tutorials/presentations on using Lustre.

As a very short term fix (possibly for 2.7.0 and suitable for 2.5.x) would be to patch mount_lustre.c to skip osd_init() entirely if a client device name (with ":/" in it) is given on the command line. That should move somewhere after parse_options() is called, maybe right before parse_ldd() since it appears to be the first place that uses the osd_*() functions. That avoids even trying to access the shared libraries on clients and avoids a whole class of bugs easily.

It is also worthwhile to read some of the discussion in LU-4800. Chris M. suggested "lustre_osd", or "lustre_tgt" as the server filesystem type.

I think a reasonable step beyond that is to allow filesystems to be mounted with type "lustre_server" (i.e. create a link from mount.lustre_server to mount.lustre, and register the "lustre_server" filesystem type in the kernel). I'm also not at all against separating the client and server mount code and creating separate binaries, as I expect that there isn't a lot of overlap in terms of mount options or handling between clients and servers. It should still be possible to have mount.lustre fork/exec /sbin/mount.lustre_server (or whatever it is called) if it detects the device name doesn't have ":/" in it, but is instead a local block device. That allows the separation of code and the ability to mount "lustre_server" filesystems directly, without the immediate requirement to move over to a separate filesystem type.

Once we have the ability to mount "lustre_server" filesystems in new releases, and possibly backported to maintenance releases (2.5.5 maybe?) for a couple of years, then we can think about removing the old mount support completely.

Comment by Gerrit Updater [ 10/Dec/14 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13019
Subject: LU-6013 utils: don't initialize OSD code for client mount
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a2e08bf5d7f68f6947a8b1dfd8a8cfbfdc03a50a

Comment by Olaf Faaland [ 11/Dec/14 ]

I agree re: changing the code so that osd_init() is not called on clients. I created an equivalent patch, then found that Bruno F. also has a patch in review that does that, http://review.whamcloud.com/#/c/12550/. So that is covered.

Thanks for the pointer to LU-4800. I understand your point about needing a transition period where "mount -t lustre" still works on the server.

I like the idea of the mount helper for the server being a different binary than the one for the client, both because it puts one step in place towards changing the mount type on the server, and also by using separate binaries makes it more convincing that the client and server mount code is really operating independently (no subtle inter-dependencies between client and server mount code on global variables, for example).

Since the same package is used for a client or server install, I can't think of a better way to allow that than your suggestion of exec'ing the server binary from within the client one, based on the device name. So I'll proceed that way for now, and change if a better idea comes up.

Comment by Andreas Dilger [ 11/Dec/14 ]

Note that there is a separate client-only package that could include just the mount.lustre binary and not the server-only mount.lustre_server binary. The server package should contain both.

Comment by Gerrit Updater [ 11/Dec/14 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13029
Subject: LU-6013 utils: don't initialize OSD code for client mount
Project: fs/lustre-release
Branch: b2_5
Current Patch Set: 1
Commit: ab077512c1b10e2f5d3f215d6bb4573f638c844c

Comment by Andreas Dilger [ 11/Dec/14 ]

I abandoned the master version of my osd_init() patch, but it probably still makes sense for b2_5 if Bruno's patch doesn't land there, to avoid problems mounting the client.

Comment by Gerrit Updater [ 13/Nov/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13019/
Subject: LU-6013 utils: don't initialize OSD code for client mount
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ce85021cbb4b2e02a56321cfff8c03a50d9d4704

Comment by Joseph Gmitter (Inactive) [ 18/Nov/15 ]

Landed for 2.8

Comment by Gerrit Updater [ 26/Feb/21 ]

Neil Brown (neilb@suse.de) uploaded a new patch: https://review.whamcloud.com/41767
Subject: LU-6013 lustre: make ldlm and target file lists
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 32be10f42fb1c998c2237d343e3d8e8cd6a9ea06

Generated at Sat Feb 10 01:56:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.