[LU-14927] osd-zfs could not be loaded on 4.14+ and 5.9+ (gpl issue) Created: 11/Aug/21  Updated: 25/Jan/24  Resolved: 20/Nov/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.7
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Aurelien Degremont (Inactive) Assignee: James A Simmons
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

osd-zfs module is not loading anymore since a kernel commit was merged in 5.9 and is now backported to 4.14 branch. This prevents using ZFS with Lustre, from 2.12 up to master.

This is due to a stronger module license check and EXPORT_SYMBOL_GPL. ZFS is CDDL and as osd-zfs depends on it, it is inheriting is taint, preventing it from using GPL-only symbols.

Here is the original commit: https://github.com/torvalds/linux/commit/262e6ae7081df304fc625cf368d5c2cbba2bb991

Backported to stable branch 4.14, and 5.4 (at least)

 

Reproduced with Lustre master branch and ZFS 2.0.5:

  • 4.14.232: OK
  • 4.14.238: Error
    $ sudo LOAD=: FSTYPE=zfs ./lustre/tests/llmount.sh
    Loading modules from /home/ec2-user/lustre/lustre/tests/..
    detected 16 online CPUs by sysfs
    libcfs will create CPU partition based on online CPUs
    libkmod: kmod_module_get_holders: could not open '/sys/module/pcc_cpufreq/holders': No such file or directory
    libkmod: kmod_module_get_holders: could not open '/sys/module/amd64_edac_mod/holders': No such file or directory
    quota/lquota options: 'hash_lqs_cur_bits=3'
    $ dmesg | tail -n5
    [ 1844.388948] ZFS: Loaded module v2.0.5-1, ZFS pool version 5000, ZFS filesystem version 5
    [ 1844.400002] osd_zfs: module uses symbols from proprietary module zfs, inheriting taint.
    [ 1844.403770] osd_zfs: Unknown symbol ktime_get_real_seconds (err 0)
    [ 1844.406319] osd_zfs: Unknown symbol init_user_ns (err 0)
    [ 1844.408548] osd_zfs: Unknown symbol ktime_get_seconds (err 0)
    

     

 Not a problem with 2.10, likely because the patch which introduced this symbol was merged in 2.11: https://review.whamcloud.com/#/c/29857/

 



 Comments   
Comment by James A Simmons [ 11/Aug/21 ]

This is going to become a really huge problem in the future. The kernel has been moving ALL its core functionality it GPL only. I could see hello_world as the only non GPL supported module in the future. We are going to have to really trim down the OSD modules to the bare minimum.

Comment by Aurelien Degremont (Inactive) [ 11/Aug/21 ]

You are working with upstream kernel way more than me, so I assume you're right, but I don't see how we could fix that with a minimum OSD modules.

Comment by James A Simmons [ 11/Aug/21 ]

We do a lot of state management for the OSD drivers which is currently touching core kernel functionality. Thinks like scrub which needs to use the kernel time API and procfs / sysfs for our tunables. To avoid this tainted issue we need to approach this like NVIDIA does with its shim layer which has no core kernel functionality to interface with another module that does. This way osd-zfs ends up being just ZFS specific handling and higher layer modules do the rest of the state management. 

Comment by James A Simmons [ 11/Aug/21 ]

This is going to take awhile to work out.

Comment by Aurelien Degremont (Inactive) [ 11/Aug/21 ]

Do you have a rough plan, even if lots of work, or everything has to be designed?

Also, seeing the philosophy behind the kernel commits, it is possible that even a shim module will eventually be target of new patches... We need to be careful. Things like NVidia driver is definitely a target for this kind of patches.

Comment by James A Simmons [ 11/Aug/21 ]

I have been looking at the osd-zfs code. From what I see I think we are looking at ~5 patches to get everything working again. Most don't look too difficult to work. The tricking one is the scrub kthread defined in the OSD drivers. We will need to move all the thread handing to the scrub.c in obdclass. 

Comment by Andreas Dilger [ 12/Aug/21 ]

It is trivial to replace ktime_get_real_seconds() with ktime_get_real_ts64() for the timestamps. These are only coarse stats (seconds), so using tv_sec is enough.

However, I'm not sure where init_user_ns is being used. Probably via some other inline kernel function/macro?

Comment by Aurelien Degremont (Inactive) [ 12/Aug/21 ]

Nice, I didn't spot that some calls in time.h are not GPL-exported. Some relies on CLOCK_MONOTONIC, we need to check if we can get that with some other calls. I'm not very familiar with the GPL/non-GPL export story. I don't know why some calls are and others are not.

Comment by James A Simmons [ 12/Aug/21 ]

Also the project quota stuff is being used. While ktime_get_real_ts64() is a stop gap I suspect it will be a temporary solution.

Comment by Gerrit Updater [ 17/Aug/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/44689
Subject: LU-14927 scrub: create shared scrub_needs_check() function.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d3b04a01506a5e08c80dcd81fdb630d0338f7048

Comment by Gerrit Updater [ 17/Aug/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/44690
Subject: LU-14927 osd: share brw_stats code between OSD back ends.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7f50c62bb81682d036f9a058487c5675b42c8576

Comment by Gerrit Updater [ 18/Aug/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/44705
Subject: LU-14927 scrub: share osd_scrub[prep|post] code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7006b1c5d076fa99b109117c37232a5b19c7f5ce

Comment by James A Simmons [ 19/Aug/21 ]

Do you know which distro have been impacted and what kernel version for the distro things started to break?

Comment by Aurelien Degremont (Inactive) [ 20/Aug/21 ]

Amazon Linux uses the latest upstream longterm kernels and these patches have been backported to 4.14.x and 5.4.x branches (see description for the kernel version). Ubuntu also uses these backports but after some discussion with Greg KH, they decided to revert them. So Ubuntu is not impacted yet.

Don't know for RHEL but there are usually conservative, so I expect these patches to take more time to reach RHEL kernels.

 

Comment by Gerrit Updater [ 23/Aug/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/44735
Subject: LU-14927 quota: move qsd_transfer to lquota module
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c514c6e881ae831a3650381bb741221449218784

Comment by Gerrit Updater [ 31/Aug/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44705/
Subject: LU-14927 scrub: share osd_scrub[prep|post] code
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: dd505aafd833f2dda274d83332e415d739da1d52

Comment by Peter Jones [ 31/Aug/21 ]

Landed for 2.15

Comment by James A Simmons [ 31/Aug/21 ]

Only one patch landed. We have two more to go.

Comment by Gerrit Updater [ 29/Sep/21 ]

"James Simmons <jsimmons@infradead.org>" uploaded a new patch: https://review.whamcloud.com/45093
Subject: LU-14927 zfs: debug code
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 736c53baa13277424de5be86ef174f5faf741c38

Comment by Gerrit Updater [ 10/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44689/
Subject: LU-14927 scrub: create shared scrub_needs_check() function.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 882a9f784de27bb6043345316ed3f5f4835a9bbf

Comment by James A Simmons [ 10/Oct/21 ]

One more patch

Comment by Gerrit Updater [ 27/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44735/
Subject: LU-14927 quota: move qsd_transfer to lquota module
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d2e8208e22f21bb7354a9207f381217c222d3df3

Comment by James A Simmons [ 27/Oct/21 ]

Still one more patch.

Comment by Gerrit Updater [ 20/Nov/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44690/
Subject: LU-14927 osd: share brw_stats code between OSD back ends.
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 8a84c7f9c7d65f6f880be6fe4d94fca26a405d81

Comment by James A Simmons [ 20/Nov/21 ]

Proper support for ZFS has been restored.

Generated at Sat Feb 10 03:13:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.