[LU-975] kernel panic on OSS when using LVM mirror regionsize greater than 512k - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 1.8.7
Affects Version/s: Lustre 1.8.x (1.8.0 - 1.8.5)
Labels:
None
Environment:
Lustre 1.8.5, OS RHEL 5.5

Severity:
3
Bugzilla ID:
24,546
Epic:
- server
Rank (Obsolete):
6493

Description

Our customer is running Lustre 1.8.5 (from Oracle) and RHEL 5.5. OST disks are mirrored with LVM. If LVM regionsize is set to greater than the default 512k the OSSs are randomly crashing with:
Sep 3 06:25:09 sklusp02a kernel: Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP:
Sep 3 06:25:09 sklusp02a kernel: [<ffffffff8822b2fd>] :dm_mod:dispatch_io+0xb9/0x19b
Sep 3 06:25:09 sklusp02a kernel: PGD 11e6408067 PUD 11e640d067 PMD 0
Sep 3 06:25:09 sklusp02a kernel: Oops: 0000 [1] SMP
Sep 3 06:25:09 sklusp02a kernel: last sysfs file: /devices/pci0000:00/0000:00:07.0/0000:06:00.1/host1/rport-1:0-1/target1:0:1/1:0:1:3/timeout
Sep 3 06:25:09 sklusp02a kernel: CPU 1
Sep 3 06:25:09 sklusp02a kernel: Modules linked in: obdfilter(U) fsfilt_ldiskfs(U) ost(U) mgc(U) ldiskfs(U) crc16(U) lustre(U) lov(U) mdc(U)
lquota(U) osc(U) ksocklnd(U) ptlrpc(U) obdclass(U) lnet(U) lvfs(U) libcfs(U) dm_log_clustered(U) lock_dlm(U) gfs2(U) dlm(U) configfs(U) mptctl
(U) mptbase(U) ipmi_watchdog(U) ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) i2c_dev(U) i2c_core(U) lockd(U) sunrpc(U) bonding(U) ipv6(U) xfr
m_nalgo(U) crypto_api(U) dm_round_robin(U) dm_multipath(U) scsi_dh(U) parport_pc(U) lp(U) parport(U) sg(U) shpchp(U) hpilo(U) pcspkr(U) serio_
raw(U) bnx2x(U) dm_raid45(U) dm_message(U) dm_region_hash(U) dm_mem_cache(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_log(U) dm_mod(U) usb_st
orage(U) qla2xxx(U) scsi_transport_fc(U) cciss(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U)
Sep 3 06:25:09 sklusp02a kernel: Pid: 10065, comm: kmirrord Tainted: G 2.6.18-194.17.1.el5_lustre.1.8.5 #1
Sep 3 06:25:09 sklusp02a kernel: RIP: 0010:[<ffffffff8822b2fd>] [<ffffffff8822b2fd>] :dm_mod:dispatch_io+0xb9/0x19b
Sep 3 06:25:09 sklusp02a kernel: RSP: 0018:ffff8111e367fb60 EFLAGS: 00010206
Sep 3 06:25:09 sklusp02a kernel: RAX: 00000000264af800 RBX: 0000000000000000 RCX: ffffffff8008cf93
Sep 3 06:25:09 sklusp02a kernel: RDX: 0000000000000050 RSI: ffff8111d9ab00c0 RDI: 0000000000000001
Sep 3 06:25:09 sklusp02a kernel: RBP: 0000000000000800 R08: 0000000000000000 R09: ffff8111edda3040
Sep 3 06:25:09 sklusp02a kernel: R10: 0000000000000001 R11: ffffffff80044fcd R12: ffff8111e367fc40
Sep 3 06:25:09 sklusp02a kernel: R13: ffff8111e367fdc0 R14: ffff811212d01e00 R15: 0000000000000000
Sep 3 06:25:09 sklusp02a kernel: FS: 0000000000000000(0000) GS:ffff81121ffb09c0(0000) knlGS:0000000000000000
Sep 3 06:25:09 sklusp02a kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep 3 06:25:09 sklusp02a kernel: CR2: 0000000000000040 CR3: 00000011e6842000 CR4: 00000000000006e0
Sep 3 06:25:09 sklusp02a kernel: Process kmirrord (pid: 10065, threadinfo ffff8111e367e000, task ffff8111edda3040)

In Lustre 1.8.2 & RHEL 5.4 there was no issue with using regionsize 4M. Customer used greater regionsize to speed up remirroring after system crash.
Customer logged this issue to Oracle Lustre support. Oracle suggested to upgrade to their 1.8.7 release. Meantime the customer switched from Oracle to Whamcloud support.
Currently we plan to upgrade to Whamcloud 1.8.7. Our question is if this issue with LVM mirror regionsize is known to Whamcloud, will the upgrade to 1.8.7 solve this issue?

Attachments

Activity

People

Assignee:: WC Triage

Reporter:: HP Slovakia team (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 09/Jan/12 9:30 AM

Updated:: 09/Jan/12 9:49 AM

Resolved:: 09/Jan/12 9:49 AM