[LU-330] SLES11 SP1 client RPMS built but lustre module fails to load Created: 16/May/11  Updated: 10/Apr/12  Resolved: 05/Apr/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Peter Chiu Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None
Environment:

Client: SLES11 SP1


Severity: 3
Rank (Obsolete): 5524

 Description   

I used the method described below to build client rpms with the source kit lustre-1.8.5.tar.gz.

There was only one error reported during the make rpms, relating to lustre-iolit-1.2-root,
but the rpms were built under /usr/src/packages/RPMS/x86_64.

The rpms lustre-modules, lustre and lustre-tests were then installed smoothly without any complaints.

But the subsequent "modprobe lustre" will return a "Killed" message, with no lustre module loaded.

dmesg also reveals "BUG: unable to handle kernel NULL pointer dereference at 0000000000000008"

A second "modprobe lustre" command will then hang, again with no module loaded.
Subsequently the client is not able to mount the lustre storage.

Please kindly advise how to resolve this problem.
Many thanks.

===========================================================

Client host cmip-proc8: cat /etc/issue:

Welcome to SUSE Linux Enterprise Server 11 SP1 (x86_64) - Kernel \r (\l).

cmip-proc8:~ # uname -a
Linux cmip-proc8.badc.rl.ac.uk 2.6.32.29-0.3-xen #1 SMP 2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux

Install kit from:
cd /usr/local/kits/lustre-1.8.5

ls -ls /usr/src/
4 drwxr-xr-x 3 root root 4096 2011-05-09 08:31 debug
0 lrwxrwxrwx 1 root root 19 2011-03-20 15:54 linux -> linux-2.6.32.29-0.3
4 drwxr-xr-x 25 root root 4096 2011-05-09 08:49 linux-2.6.32.29-0.3
4 drwxr-xr-x 3 root root 4096 2011-03-20 15:54 linux-2.6.32.29-0.3-obj
4 drwxr-xr-x 3 root root 4096 2011-03-20 15:54 linux-obj
4 drwxr-xr-x 10 root root 4096 2011-05-09 08:31 lustre-1.8.5
4 drwxr-xr-x 7 root root 4096 2011-03-20 14:58 packages

Install command:

./configure --with-linux=/usr/src/linux-2.6.32.29-0.3 --with-linux-obj=/usr/src/linux-2.6.32.29-0.3-obj/x86_64/xen

make rpms

One error recorded:
+ ./configure --prefix=/usr
configure: error: cannot find install-sh or install.sh in . ./.. ./../..
error: Bad exit status from /var/tmp/rpm-tmp.51316 (%build)

RPM build errors:
Bad exit status from /var/tmp/rpm-tmp.51316 (%build)
make[1]: *** [rpms] Error 1
make[1]: Leaving directory `/usr/local/kits/lustre-1.8.5/lustre-iokit'

By trial and error, this error can be avoided if I rsync /usr/local/kits/lustre-1.8.5/lustre-iokit /usr/src/packages/BUILD/lustre-iokit-1.2

Anyway, rpms are built under:

cmip-proc8:/usr/local/kits/lustre-1.8.5 # ls /usr/src/packages/RPMS//x86_64/1.8.5
/usr/src/packages/RPMS//x86_64/lustre-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm
/usr/src/packages/RPMS//x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm

No error when installing these rpms:

cmip-proc8:/usr/local/kits/lustre-1.8.5 # rpm -qa | grep lustre
lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815
lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815

To check and load lustre module but none is loaded:

cmip-proc8:~ # lsmod | grep lustre
cmip-proc8:~ # modprobe lustre
Killed
cmip-proc8:~ # lsmod | grep lustre
cmip-proc8:~ # modprobe lustre &
[1] 3454
cmip-proc8:~ #
cmip-proc8:~ # ps auxw | grep lustre
root 3454 0.0 0.0 3940 624 pts/1 S 18:04 0:00 modprobe lustre

Dmesg records this error after the first "modeprobe lustre" command:

cmip-proc8:/usr/local/kits/lustre-1.8.5 # diff /tmp/d1 /tmp/d2
195a196,250
> [ 168.647996] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> [ 168.648006] IP: [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [ 168.648018] PGD 7fac4067 PUD 7ef4c067 PMD 0
> [ 168.648023] Oops: 0000 1 SMP
> [ 168.648026] last sysfs file: /sys/module/ip_tables/initstate
> [ 168.648028] CPU 0
> [ 168.648030] Modules linked in: lnet(N+) lvfs(N) libcfs(N) iptable_nat nf_nat xt_tcpudp xt_pkttype ipt_LOG xt_limit autofs4 binfmt_misc microcode xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6_tables x_tables fuse loop dm_mod joydev rtc_core rtc_lib xennet ext3 mbcache jbd processor thermal_sys hwmon xenblk cdrom
> [ 168.648063] Supported: Yes
> [ 168.648066] Pid: 3445, comm: modprobe Tainted: G N 2.6.32.29-0.3-xen #1
> [ 168.648069] RIP: e030:[<ffffffff8002c3d2>] [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [ 168.648074] RSP: e02b:ffff88007efa5e38 EFLAGS: 00010082
> [ 168.648077] RAX: 0000000000000001 RBX: 0000000000009700 RCX: dead000000100100
> [ 168.648080] RDX: 0000000000000000 RSI: ffff88007efa5e88 RDI: 0000000000000000
> [ 168.648083] RBP: ffff88007efa5e58 R08: ffffffffa0252fb6 R09: 0000000000000000
> [ 168.648086] R10: 0000000000000001 R11: 0000000000000061 R12: 0000000000009700
> [ 168.648089] R13: 0000000000000000 R14: ffff88007efa5e88 R15: 000000000000000f
> [ 168.648095] FS: 00007f3f41030700(0000) GS:ffff8800013c1000(0000) knlGS:0000000000000000
> [ 168.648098] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 168.648101] CR2: 0000000000000008 CR3: 000000007ef7d000 CR4: 0000000000002660
> [ 168.648104] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 168.648107] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 168.648110] Process modprobe (pid: 3445, threadinfo ffff88007efa4000, task ffff88007e9100c0)
> [ 168.648113] Stack:
> [ 168.648115] ffffffffa02579f8 0000000000000000 0000000000623da0 0000000000623d30
> [ 168.648118] <0> ffff88007efa5eb8 ffffffff80038588 000000007ef8ef00 00000000a02579f8
> [ 168.648123] <0> 00000000a0243060 0000000000000000 0000000000000001 ffffffffa02579f8
> [ 168.648129] Call Trace:
> [ 168.648138] [<ffffffff80038588>] try_to_wake_up+0x48/0x420
> [ 168.648143] [<ffffffff8005b2e8>] up+0x48/0x50
> [ 168.648153] [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet]
> [ 168.648167] [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet]
> [ 168.648178] [<ffffffff80004045>] do_one_initcall+0x35/0x1b0
> [ 168.648184] [<ffffffff8006d154>] sys_init_module+0xe4/0x270
> [ 168.648189] [<ffffffff80007458>] system_call_fastpath+0x16/0x1b
> [ 168.648194] [<00007f3f40bc9f7a>] 0x7f3f40bc9f7a
> [ 168.648196] Code: 1c 24 49 89 f6 4c 89 64 24 08 49 c7 c4 00 97 00 00 65 8a 04 25 c1 67 00 00 65 c6 04 25 c1 67 00 00 01 0f b6 c0 4c 89 e3 49 89 06 <49> 8b 45 08 8b 40 18 48 03 1c c5 80 ae 62 80 48 89 df e8 f7 87
> [ 168.648230] RIP [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0
> [ 168.648234] RSP <ffff88007efa5e38>
> [ 168.648236] CR2: 0000000000000008
> [ 168.648239] --[ end trace 57429513f7001015 ]--
cmip-proc8:/usr/local/kits/lustre-1.8.5 #

I have also tried Lustre-1.8.4, but got the same result.



 Comments   
Comment by Andreas Dilger [ 05/Apr/12 ]

The SLES11 SP1 RPMs are built for every build on Lustre 2.x.

Comment by Peter Chiu [ 10/Apr/12 ]

Thanks Andreas, for this email update.

I have checked on the web link concerned, but don't think I have found further details on the fix.

May I ask what has been changed to resolve the problem encountered?

Or what do I need to do at the client side to enable lustre modules to be loaded?

Many thanks again.

Regards,
Peter

Generated at Sat Feb 10 05:23:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.