Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.1.0, Lustre 1.8.6
-
None
-
Lustre Branch: v1_8_6_RC2
Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/80/
e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/40/
Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED, kernel version: 2.6.32-131.2.1.el6)
RHEL5/x86_64(server, OFED 1.5.3.1, kernel version: 2.6.18-238.12.1.el5_lustre)
Lustre Branch: v1_8_6_RC2 Lustre Build: http://newbuild.whamcloud.com/job/lustre-b1_8/80/ e2fsprogs Build: http://newbuild.whamcloud.com/job/e2fsprogs-master/40/ Distro/Arch: RHEL6/x86_64(patchless client, in-kernel OFED, kernel version: 2.6.32-131.2.1.el6) RHEL5/x86_64(server, OFED 1.5.3.1, kernel version: 2.6.18-238.12.1.el5_lustre)
-
3
-
4271
Description
After mounting and unmounting Lustre filesystem, running lustre_rmmod caused the Lustre client node crash as follows:
BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40 PGD 31ae08067 PUD 312eae067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map CPU 2 Modules linked in: llite_lloop(-)(U) lustre(U) mgc(U) lov(U) osc(U) mdc(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs lockd fscache(T ) nfs_acl auth_rpcgss autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_reg ion_hash dm_log mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif ahci dm_mod [last unloaded: microcode] Modules linked in: llite_lloop(-)(U) lustre(U) mgc(U) lov(U) osc(U) mdc(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs lockd fscache(T ) nfs_acl auth_rpcgss autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_reg ion_hash dm_log mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif ahci dm_mod [last unloaded: microcode] Pid: 4826, comm: rmmod Tainted: G ---------------- T 2.6.32-131.2.1.el6.x86_64 #1 X8DTT RIP: 0010:[<ffffffff814dcf35>] [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40 RSP: 0018:ffff880318cd9da8 EFLAGS: 00010092 RAX: 0000000000010000 RBX: ffff880328bda000 RCX: 000000000000b1a0 RDX: 0000000000000000 RSI: ffff88031ce09a90 RDI: 0000000000000000 RBP: ffff880318cd9da8 R08: 0000000000000001 R09: ffffffff817c3f86 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88031ce09800 R13: ffff880328bda000 R14: ffff88031ce0b560 R15: 0000000000000001 FS: 00007fb1de18d700(0000) GS:ffff880032e40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000031ae78000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process rmmod (pid: 4826, threadinfo ffff880318cd8000, task ffff88032123ca80) Stack: ffff880318cd9dd8 ffffffff8125689c ffff880328bda000 ffff880328bda328 <0> ffff880328bda328 ffff88031ce0b560 ffff880318cd9df8 ffffffff8124ba66 <0> ffffffff81a8a820 ffff880328bda360 ffff880318cd9e28 ffffffff81264a2d Call Trace: [<ffffffff8125689c>] blk_throtl_exit+0x3c/0xd0 [<ffffffff8124ba66>] blk_release_queue+0x26/0x80 [<ffffffff81264a2d>] kobject_release+0x8d/0x240 [<ffffffff812649a0>] ? kobject_release+0x0/0x240 [<ffffffff81265fd7>] kref_put+0x37/0x70 [<ffffffff812648a7>] kobject_put+0x27/0x60 [<ffffffff81247687>] blk_cleanup_queue+0x57/0x70 [<ffffffffa08070b1>] lloop_exit+0x61/0x300 [llite_lloop] [<ffffffff81069012>] ? put_online_cpus+0x52/0x70 [<ffffffff810a8ef8>] ? module_refcount+0x58/0x70 [<ffffffff810a9a74>] sys_delete_module+0x194/0x260 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Code: c1 74 0e f3 90 0f b7 0f eb f5 83 3f 00 75 f4 eb df 48 89 d0 c9 c3 55 48 89 e5 0f 1f 44 00 00 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 eb f5 RIP [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40 RSP <ffff880318cd9da8> CR2: 0000000000000000
This failure could be easily reproduced by running llmount.sh and then llmountcleanup.sh.
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.6-wc1 release testing tracker Lustre 1.8.6wc1 RC1 Tag: v186RC1 RC1 was DOA due to a build failure related to tag name LU408
got this problem again when I run sanity test_68a with the latest master build RHEL6/x86_64/#190
Lustre: DEBUG MARKER: == sanity test 68a: lloop driver - basic test ========================== 14:48:58 (1309297738)
(U) ext2 lustre(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs lockd fscache(T) nfs_acl auth_rpcgss autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_region_hash dm_log mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif ahci dm_mod [last unloaded: microcode]
Lustre: 8193:0:(lloop.c:711:lloop_ioctl()) Enter llop_ioctl
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40
PGD 30e626067 PUD 30f053067 PMD 0
Oops: 0002 1 SMP
last sysfs file: /sys/devices/virtual/block/lloop11/range
CPU 2
Modules linked in: llite_lloop
Modules linked in: llite_lloop
(U) ext2 lustre(U) mgc(U) lov(U) osc(U) mdc(U) lmv(U) fid(U) fld(U) lquota(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lvfs(U) ksocklnd(U) lnet(U) libcfs(U) nfs lockd fscache(T) nfs_acl auth_rpcgss autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_addr ipv6 ib_sa dm_mirror dm_region_hash dm_log mlx4_ib ib_mad ib_core mlx4_en mlx4_core igb serio_raw ghes hed i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif ahci dm_mod [last unloaded: microcode]
Pid: 8201, comm: rmmod Tainted: G ---------------- T 2.6.32-131.2.1.el6.x86_64 #1 X8DTT
RIP: 0010:[<ffffffff814dcf35>] [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40
RSP: 0018:ffff88030ec31da8 EFLAGS: 00010092
RAX: 0000000000010000 RBX: ffff880326822aa0 RCX: 000000000000720e
RDX: 0000000000000000 RSI: ffff88030e6e1e90 RDI: 0000000000000000
RBP: ffff88030ec31da8 R08: 000000000000000c R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88030e6e1c00
R13: ffff880326822aa0 R14: ffff8802fefa8740 R15: 0000000000000001
FS: 00007f58858a0700(0000) GS:ffff880032e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000030e8ae000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process rmmod (pid: 8201, threadinfo ffff88030ec30000, task ffff88030e886b00)
Stack:
ffff88030ec31dd8 ffffffff8125689c ffff880326822aa0 ffff880326822dc8
<0> ffff880326822dc8 ffff8802fefa8740 ffff88030ec31df8 ffffffff8124ba66
<0> ffffffff81a8a820 ffff880326822e00 ffff88030ec31e28 ffffffff81264a2d
Call Trace:
[<ffffffff8125689c>] blk_throtl_exit+0x3c/0xd0
[<ffffffff8124ba66>] blk_release_queue+0x26/0x80
[<ffffffff81264a2d>] kobject_release+0x8d/0x240
[<ffffffff812649a0>] ? kobject_release+0x0/0x240
[<ffffffff81265fd7>] kref_put+0x37/0x70
[<ffffffff812648a7>] kobject_put+0x27/0x60
[<ffffffff81247687>] blk_cleanup_queue+0x57/0x70
[<ffffffffa00410b1>] lloop_exit+0x61/0x2f0 [llite_lloop]
[<ffffffff81069012>] ? put_online_cpus+0x52/0x70
[<ffffffff810a8ef8>] ? module_refcount+0x58/0x70
[<ffffffff810a9a74>] sys_delete_module+0x194/0x260
[<ffffffff8100b172>] system_call_fastpath+0x16/0x1b
Code: c1 74 0e f3 90 0f b7 0f eb f5 83 3f 00 75 f4 eb df 48 89 d0 c9 c3 55 48 89 e5 0f 1f 44 00 00 fa 66 0f 1f 44 00 00 b8 00 00 01 00 <f0> 0f c1 07 0f b7 d0 c1 e8 10 39 c2 74 0e f3 90 0f b7 17 eb f5
RIP [<ffffffff814dcf35>] _spin_lock_irq+0x15/0x40
RSP <ffff88030ec31da8>
CR2: 0000000000000000
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-131.2.1.el6.x86_64 (mockbuild@x86-003.build.bos.redhat.com) (gcc version 4.4.5 20110214 (Red Hat 4.4.5-6) (GCC) ) #1 SMP Wed May 18 07:07:37 EDT 2011
Command line: ro root=UUID=e41f2282-ba65-4051-97ff-6b7f533b8a60 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us console=ttyS0,115200 irqpoll maxcpus=1 reset_devices cgroup_disable=memory memmap=exactmap memmap=640K@0K memmap=131436K@33408K elfcorehdr=164844K memmap=104K$920K memmap=8K$3136952K memmap=56K#3136960K memmap=328K#3137016K memmap=64K$3137344K memmap=8272K$3137456K memmap=262144K$3670016K memmap=4K$4175872K memmap=4096K$4190208K
KERNEL supported cpus:
Intel GenuineIntel
AMD AuthenticAMD
Centaur CentaurHauls
BIOS-provided physical RAM map: