Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 1.8.8
-
None
-
Sun Fire X4540, RHEL5.4
-
1
-
6347
Description
We've jsut upgraded our lustre servers from RHEL5.2 to RHEL5.4 and oracle lustre version 1.8.5 to whamcloud lustre version 1.8.8. We did not upgrade ofed, so the servers are running the old ofed-1.4.1
The clients are running Centos5.8 with lustre-1.8.8
when we generate IO to lustre, the OSS servers panic and the console shows :-
Kernel BUG at fs/bio.c:222
invalid opcode: 0000 [1] SMP
last sysfs file: /class/infiniband_mad/umad0/port
CPU 2
Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) jbd2(U) crc16(U) lustre(U) lov(U) mdc(U) lquota(U) osc(U) ko2iblnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) raid456(U) xor(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) deflate(U) zlib_deflate(U) ccm(U) serpent(U) blowfish(U) twofish(U) ecb(U) xcbc(U) crypto_hash(U) cbc(U) md5(U) sha256(U) sha512(U) des(U) aes_generic(U) testmgr_cipher(U) testmgr(U) crypto_blkcipher(U) aes_x86_64(U) ah6(U) ah4(U) esp6(U) xfrm6_esp(U) esp4(U) xfrm4_esp(U) aead(U) crypto_algapi(U) xfrm4_tunnel(U) tunnel4(U) xfrm4_mode_tunnel(U) xfrm4_mode_transport(U) xfrm6_mode_transport(U) xfrm6_mode_tunnel(U) ipcomp(U) ipcomp6(U) xfrm6_tunnel(U) tunnel6(U) af_key(U) autofs4(U) hidp(U) rfcomm(U) l2cap(U) bluetooth(U) lockd(U) sunrpc(U) ib_ipoib(U) ipoib_helper(U) ipv6(U) xfrm_nalgo(U) crypto_api(U) cpufreq_ondemand(U) powernow_k8(U) freq_table(U) mperf(U) rdma_ucm(U) rdma_cm(U) iw_cm(U) ib_addr(U) qlgc_vnic(U) ib_cm(U) ib_sa(U) ib_uverbs(U) ib_umad(U) iw_nes(U) iw_cxgb3(U) cxgb3(U) ib_ipath(U) mlx4_ib(U) mlx4_core(U) loop(U) dm_mirror(U) dm_multipath(U) scsi_dh(U) video(U) backlight(U) sbs(U) power_meter(U) i2c_ec(U) dell_wmi(U) wmi(U) button(U) battery(U) asus_acpi(U) acpi_memhotplug(U) ac(U) parport_pc(U) lp(U) parport(U) joydev(U) ib_mthca(U) tpm_tis(U) tpm(U) k10temp(U) sg(U) hwmon(U) forcedeth(U) amd64_edac_mod(U) ib_mad(U) tpm_bios(U) i2c_nforce2(U) edac_mc(U) i2c_core(U) 8021q(U) pcspkr(U) ib_core(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_log(U) dm_mod(U) dm_mem_cache(U) shpchp(U) mptsas(U) mptscsih(U) mptbase(U) scsi_transport_sas(U) sd_mod(U) scsi_mod(U) raid1(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Pid: 7206, comm: md12_raid5 Tainted: G ---- 2.6.18-308.4.1.el5_lustre #1
RIP: 0010:[<ffffffff8002dd5b>] [<ffffffff8002dd5b>] bio_put+0xa/0x31
RSP: 0018:ffff81084e347d08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff810436ba6440 RCX: ffff81084e31a910
RDX: ffff810436ba6440 RSI: ffff81045bb32040 RDI: ffff810436ba6440
RBP: ffff81045bb32040 R08: 0000000000000000 R09: 000000000000003e
R10: ffff810854affa00 R11: 0000000000000280 R12: ffff81045bb32000
R13: ffff810854affa00 R14: 00000000ffffffff R15: 0000000000000000
FS: 00002b5d980be6e0(0000) GS:ffff81010f759240(0000) knlGS:00000000f7d798d0
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002b25b3265000 CR3: 0000000437930000 CR4: 00000000000006e0
Process md12_raid5 (pid: 7206, threadinfo ffff81084e346000, task ffff810461e450c0)
Stack: ffffffff80041d34 0000000000000000 ffffffff8892d2ab ffffffffffffffff
0000000000000000 ffff810475c1bbf0 0000000000000002 0000000000000003
0000000000000008 0000000000000000 000000000000000a 0000000000000000
Call Trace:
[<ffffffff80041d34>] end_bio_bh_io_sync+0x37/0x3b
[<ffffffff8892d2ab>] :raid456:handle_stripe+0xfd1/0x2549
[<ffffffff8022720c>] bitmap_daemon_work+0x329/0x33c
[<ffffffff8002dee8>] __wake_up+0x38/0x4f
[<ffffffff800a328f>] keventd_create_kthread+0x0/0xc4
[<ffffffff800a328f>] keventd_create_kthread+0x0/0xc4
[<ffffffff8892e97b>] :raid456:raid5d+0x158/0x18b
[<ffffffff8003ab5e>] prepare_to_wait+0x34/0x61
[<ffffffff80223492>] md_thread+0xf8/0x10e
[<ffffffff800a34a7>] autoremove_wake_function+0x0/0x2e
[<ffffffff8022339a>] md_thread+0x0/0x10e
[<ffffffff80032652>] kthread+0xfe/0x132
[<ffffffff8005dfb1>] child_rip+0xa/0x11
[<ffffffff800a328f>] keventd_create_kthread+0x0/0xc4
[<ffffffff80032554>] kthread+0x0/0x132
[<ffffffff8005dfa7>] child_rip+0x0/0x11
Code: 0f 0b 68 41 14 2c 80 c2 de 00 eb fe f0 ff 4f 50 0f 94 c0 84
RIP [<ffffffff8002dd5b>] bio_put+0xa/0x31
RSP <ffff81084e347d08>
<0>Kernel panic - not syncing: Fatal exception
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker
While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA