[LU-4606] Lustre hard codes libzfs.so.1 in lustre/utils/mount_utils_zfs.c Created: 10/Feb/14  Updated: 03/Nov/14  Resolved: 21/May/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0, Lustre 2.6.0, Lustre 2.4.2
Fix Version/s: Lustre 2.6.0, Lustre 2.5.3

Type: Bug Priority: Major
Reporter: Anthony Alba Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: zfs
Environment:

RHEL 6.x Lustre 2.4.2 with ZFS


Issue Links:
Duplicate
duplicates LU-5060 mkfs.lustre --backfstype=zfs fails to... Closed
Related
is related to LU-4944 build fails with latest zfs source Resolved
is related to LU-2976 ZFS upstream sonames are versioned Resolved
is related to LU-5091 LU-4606 breaks --with-zfs-devel option Resolved
is related to LU-5102 Loading plugin for a given mount_type... Resolved
is related to LU-5851 Handle LU-4606 packaging changes for ... Resolved
is related to LU-5501 new e2fsprogs causes llverfs build fa... Resolved
is related to LU-5096 Linker cannot find custom libzfs loca... Resolved
is related to LU-5643 Flubbed merging of two patches in cha... Resolved
Epic/Theme: ZFS
Severity: 3
Rank (Obsolete): 12600

 Description   

Lustre hard codes libzfs.so.1 in mount_utils_zfs.c
ZoL git tree (after v0.6.2 tag) has bumped the so version to 2:0:0 so
libzfs is now libzfs.so.2.

This breaks user space utilities.
Linking libzfs.so.2.0.0 to libzfs.so.1 works so there doesn't seem to be
any ABI breakage.



 Comments   
Comment by Nathaniel Clark [ 10/Feb/14 ]

I do have a patch related to this http://review.whamcloud.com/#/c/8979/

This will be updated when the next stable release of zfs/spl happens.

Comment by Alexey Shvetsov [ 13/Feb/14 ]

Its better to add configure time check for libzfs version and link it directly to mkfs.lustre instead of dlopening it

Comment by Richard Yao [ 13/Feb/14 ]

ZFSOnLinux imported a change to its internal library API from Illumos, which required a SONAME bump. It would be advisable to evaluate a switch to the new libzfs_core library, which is meant to provide a public library API with a stable interface.

Comment by Andreas Dilger [ 19/Mar/14 ]

The reason that mkfs.lustre is using dlopen() instead of linking to the ZFS library directly is so that the same user tools can be used on systems with or without ZFS packages installed.

See ORI-425: http://review.whamcloud.com/1740 and http://review.whamcloud.com/1742

Comment by Isaac Huang (Inactive) [ 19/Mar/14 ]

It seemed that the dynamic linker would only try to resolve symbols when they're first used. From ld.so(8):
LD_BIND_NOW
If present, causes the dynamic linker to resolve all symbols at program startup instead of when they are first referenced.

If true, then it might work if we just delay that zfs_init() call until when the 1st real zfs operation is needed, e.g. in zfs_make_lustre(). Then we'd be able to get rid of the dlopen() without causing headache to ldiskfs systems. I'm no expert on the dynamic linker, but it seems worthwhile to give it a shot.

Comment by Nathaniel Clark [ 23/Apr/14 ]

Ran with lastest masters of lustre, zfs and spl http://review.whamcloud.com/#/c/8979/6 but ran into ZFS bug 1891 in sanity/65ia

Comment by Christopher Morrone [ 23/Apr/14 ]

The ZFS bug need not hold up fixing this bug correctly.

Comment by Brian Behlendorf [ 23/Apr/14 ]

Does sanity/65ia consistently reproduce this issue?

Comment by Christopher Morrone [ 24/Apr/14 ]

LLNL completely agrees with the previous commenters in this ticket that argue for the removal of the hardcoded zfs library versions in dlopen() calls in lustre. I commented to that effect in my review of patch 8697.

The issue was raised that rpm will add a dependency on zfs if you compile with ZFS support. Frankly, I don't see the problem. If you don't want a dependency on zfs, don't compile against zfs. Same with ldiskfs.

But ok, fine, I'll play along. So you want to make both OSD rpms available, but neither required. Then obviously we need to have any direct dependencies on the contents of those OSD rpms contained in those OSD rpms, not present in more general shared RPMs.

So how about fixing that issue? Modularize the mount command. Each OSD rpm offers a dynamically loadable component for the main mount command. The mount command can then load the osd-specific shared library as needed, and only that osd shared library has dependencies on the backend filesystem. At rpm build time, the zfs osd rpm would have the dependencies on zfs, and the ldiskfs osd rpm would have the dependencies on ldiskfs. No leakage of dependencies into the more general rpms.

I'm just throwing this out off the top of my head. Other options are possible.

Granted, our current hard-code-the-library-string strategy was easy the first time, but it also half-assed and causing maintainance problems. Lets take the time to do it right this time around.

Comment by Nathaniel Clark [ 01/May/14 ]

This splits zfs functionality out of mount/mkfs into a loadable module and puts that module in lustre-osd-zfs

http://review.whamcloud.com/10193

Comment by Nathaniel Clark [ 21/May/14 ]

Patch landed to master, Opened LU-5096 to address Brian's issues with custom libzfs location that I missed.

Generated at Sat Feb 10 01:44:15 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.