[LU-3121] lustre-initialization-1: Can't load module 'osd-zfs' Created: 08/Apr/13  Updated: 02/Jun/15  Resolved: 05/Aug/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Maloo Assignee: Chris Gearing (Inactive)
Resolution: Fixed Votes: 0
Labels: HB, zfs

Attachments: File client-24vm3.ks-post.log    
Issue Links:
Related
is related to LU-2969 Fail to mount MDS as zfs on external ... Closed
Severity: 3
Rank (Obsolete): 7578

 Description   

This issue was created by maloo for Li Wei <liwei@whamcloud.com>

This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/418e2830-9f9f-11e2-9f27-52540035b04c.

The sub-test lustre-initialization_1 failed with the following error:

Test system failed to start single suite, so abandoning all hope and giving up

Info required for matching: lustre-initialization-1 lustre-initialization_1

For some unknown reason, the MDS couldn't load osd-zfs.ko:

08:17:45:LNet: Accept all, port 7988
08:17:45:LustreError: 158-c: Can't load module 'osd-zfs'
08:17:45:LustreError: 2944:0:(genops.c:304:class_newdev()) OBD: unknown type: osd-zfs
08:17:45:LustreError: 2944:0:(obd_config.c:374:class_attach()) Cannot create device lustre-MDT0000-osd of type osd-zfs : -19
08:17:45:LustreError: 2944:0:(obd_mount.c:196:lustre_start_simple()) lustre-MDT0000-osd attach error -19
08:17:45:LustreError: 2944:0:(obd_mount_server.c:1664:server_fill_super()) Unable to start osd on lustre-mdt1/mdt1: -19
08:17:45:LustreError: 2944:0:(obd_mount.c:1264:lustre_fill_super()) Unable to mount  (-19)


 Comments   
Comment by Peter Jones [ 08/Apr/13 ]

Lai

Could you please comment?

Thanks

Peter

Comment by Li Wei (Inactive) [ 08/Apr/13 ]

I spent some time to provision the build manually on Toro. After "yum install"ing lustre-osd-zfs RPM and the ZFS RPMs, the MDT could be successfully mounted. This suggests something might be wrong with Autotest/Toro.

Comment by Jodi Levi (Inactive) [ 08/Apr/13 ]

Duplicate of LU-2969

Comment by Peter Jones [ 08/Apr/13 ]

Seems to only hit rarely but we'd still like to understand why this happens

Comment by Chris Gearing (Inactive) [ 08/Apr/13 ]

I will had extra debug to Autotest which will require a restart, this issue must be intermittent because zfs testing does run.

https://maloo.whamcloud.com/test_sessions?utf8=%E2%9C%93&test_group=review-zfs&commit=Apply+Filter

Comment by Nathaniel Clark [ 09/Apr/13 ]

I think the https://maloo.whamcloud.com/test_sessions/3ef2158c-9f9f-11e2-9f27-52540035b04c failure is due to the patch being tested.

Comment by Li Wei (Inactive) [ 12/Apr/13 ]

Nathaniel, could you explain how the patch caused the failure, please? I went through the patch again, but did not figure out why it would cause such failures.

Comment by Nathaniel Clark [ 16/Apr/13 ]

Li, I am wildly incorrect. I must have been looking at a different patch. My comment form the 9th is wrong. Sorry for the confusion.

Comment by Li Wei (Inactive) [ 17/Apr/13 ]

No problem. Ironically, my http://review.whamcloud.com/5785 has been hitting this issue every time recently, although I can't see any fault in the patch itself. Probably there really is a problem in the patch...

Comment by Li Wei (Inactive) [ 17/Apr/13 ]

Ah, there are also considerable amount of failures when testing other patches.

Comment by Zhenyu Xu [ 17/Apr/13 ]

another hit at https://maloo.whamcloud.com/test_sets/d170d43c-a767-11e2-b3cc-52540035b04c

Comment by Jian Yu [ 06/May/13 ]

Another instance: https://maloo.whamcloud.com/test_sets/2352a39e-b658-11e2-bf90-52540035b04c

Comment by Jian Yu [ 07/May/13 ]

This is blocking the patch review testing on zfs:
https://maloo.whamcloud.com/test_sets/f3082170-b678-11e2-a5af-52540035b04c

Comment by Jian Yu [ 08/May/13 ]

Another one: https://maloo.whamcloud.com/test_sets/cd712682-b72c-11e2-bd0f-52540035b04c

Comment by Nathaniel Clark [ 09/May/13 ]

rpms seem to install and run cleanly, must be an install issue of some sort. Possibly depmod issue?

Comment by Nathaniel Clark [ 09/May/13 ]

I can reproduce these exact symptoms by installing all of lustre except lustre-osd-zfs normally, then install lustre-osd-zfs with --noscripts (causing it to skip running depmod). lustre-osd-zfs may skip depmod if the kernel isn't installed yet, still trying to figure out this exact vector.

Comment by Nathaniel Clark [ 09/May/13 ]

If it's a module dependency ordering issue, it may be solved by http://review.whamcloud.com/6259 which fixes up lustre-osd-* / lustre-modules rpm dependencies.

Comment by Nathaniel Clark [ 14/May/13 ]

The above linked patch has landed for LU-3269.

Comment by Jian Yu [ 15/May/13 ]

Lustre Branch: master
Lustre Build: http://build.whamcloud.com/job/lustre-master/1495 (which contains the patch for LU-3269)
FSTYPE=zfs

The issue in this ticket still occurred:
https://maloo.whamcloud.com/test_sessions/754709ca-bd37-11e2-a548-52540035b04c

Comment by Jian Yu [ 18/May/13 ]

Lustre Tag: v2_4_0_RC1
Lustre Build: http://build.whamcloud.com/job/lustre-master/1501/
Distro/Arch: RHEL6.4/x86_64
FSTYPE=zfs

Hit the issue again:
https://maloo.whamcloud.com/test_sets/7e89ecee-bf5a-11e2-88e0-52540035b04c

Comment by Nathaniel Clark [ 24/May/13 ]
+ yum install -y kernel-2.6.32-358.6.1.el6_lustre.x86_64 lustre-ldiskfs lustre-modules lustre lustre-tests
Loaded plugins: fastestmirror, security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package kernel.x86_64 0:2.6.32-358.6.1.el6_lustre will be installed
---> Package lustre.x86_64 0:2.4.50-2.6.32_358.6.1.el6_lustre.x86_64 will be installed
--> Processing Dependency: lustre-osd for package: lustre-2.4.50-2.6.32_358.6.1.el6_lustre.x86_64.x86_64
--> Processing Dependency: libnetsnmpmibs.so.20()(64bit) for package: lustre-2.4.50-2.6.32_358.6.1.el6_lustre.x86_64.x86_64
--> Processing Dependency: libnetsnmphelpers.so.20()(64bit) for package: lustre-2.4.50-2.6.32_358.6.1.el6_lustre.x86_64.x86_64
--> Processing Dependency: libnetsnmpagent.so.20()(64bit) for package: lustre-2.4.50-2.6.32_358.6.1.el6_lustre.x86_64.x86_64
--> Processing Dependency: libnetsnmp.so.20()(64bit) for package: lustre-2.4.50-2.6.32_358.6.1.el6_lustre.x86_64.x86_64
---> Package lustre-ldiskfs.x86_64 0:4.1.0-2.6.32_358.6.1.el6_lustre.x86_64 will be installed
---> Package lustre-modules.x86_64 0:2.4.50-2.6.32_358.6.1.el6_lustre.x86_64 will be installed
---> Package lustre-tests.x86_64 0:2.4.50-2.6.32_358.6.1.el6_lustre.x86_64 will be installed
--> Running transaction check
---> Package lustre-osd-ldiskfs.x86_64 0:2.4.50-2.6.32_358.6.1.el6_lustre.x86_64 will be installed
---> Package net-snmp-libs.x86_64 1:5.5-44.el6_4.1 will be installed
--> Processing Dependency: libsensors.so.4()(64bit) for package: 1:net-snmp-libs-5.5-44.el6_4.1.x86_64
--> Running transaction check
---> Package lm_sensors-libs.x86_64 0:3.1.1-17.el6 will be installed
--> Finished Dependency Resolution

The problem is that lustre requires lustre-osd and picks lustre-osd-ldiskfs thus not installing lustre-osd-zfs

The yum line should explicitly install lustre-osd-ldiskfs and lustre-osd-zfs, or we could add a virtual package lustre-all that requires both osd packages.

Comment by Nathaniel Clark [ 24/May/13 ]

zfs and ancillary packages are installed after kickstart, but lustre-osd-zfs isn't one of them.

Comment by Nathaniel Clark [ 18/Jul/13 ]

I believe this is fixed now. There is still some bug where lustre-inialization-1 fails, but it does not appear to be failing on loading osd-zfs.

Generated at Sat Feb 10 01:31:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.