Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
Lustre 1.8.x (1.8.0 - 1.8.5)
-
None
-
MDS, MGS, OSS : SLES 11, Lustre 1.8.4 (oracle), kernel 2.6.27.39-0.1_lustre.1.8.4-default, OFED 1.4.2
INTERCONNECT : Infiniband
Server nodes are 'coupled' in pairs of high availability nodes with help of Linux - HA (heartbeat-2.1.4-4.1)
Configuration is the same (besides node names and UDP port) for all nodes:
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 90
warntime 30
initdead 180
udpport 10119
bcast eth0 ib0
auto_failback off
stonith_host jf92o05 external/ipmi jf92o06 jf92o06s ADMIN jadminsb lanplus
stonith_host jf92o06 external/ipmi jf92o05 jf92o05s ADMIN jadminsb lanplus
node jf92o05
node jf92o06
CLIENTS : SLES 11 SP1, Lustre 1.8.4 (oracle) patchless, kernel 2.6.32.23-0.3-default, OFED 1.4.2MDS, MGS, OSS : SLES 11, Lustre 1.8.4 (oracle), kernel 2.6.27.39-0.1_lustre.1.8.4-default, OFED 1.4.2 INTERCONNECT : Infiniband Server nodes are 'coupled' in pairs of high availability nodes with help of Linux - HA (heartbeat-2.1.4-4.1) Configuration is the same (besides node names and UDP port) for all nodes: debugfile /var/log/ha-debug logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 90 warntime 30 initdead 180 udpport 10119 bcast eth0 ib0 auto_failback off stonith_host jf92o05 external/ipmi jf92o06 jf92o06s ADMIN jadminsb lanplus stonith_host jf92o06 external/ipmi jf92o05 jf92o05s ADMIN jadminsb lanplus node jf92o05 node jf92o06 CLIENTS : SLES 11 SP1, Lustre 1.8.4 (oracle) patchless, kernel 2.6.32.23-0.3-default, OFED 1.4.2
-
3
-
21,804
-
6507
Description
Sorry if this is a duplicate, but I couldn't find a similar bug.
Failure is restricted to OSS nodes and occurs as follows:
1 One OSS node crash. Heartbeat manage to takeover the resources towards the standy node smoothly.
There's no indication of any IB errors in the opensm.log; No Error in /var/log/messages and /var/log/warn. No resource (CPU, Memory, network, Disk) is exhausted (I can provide the collectl files if needed). One thing that might be noticed is that the 'ldiskfs_inode_cache' increase constantly over 1GB till the nodes crashes (numslabs, object, size). See attached collectl excerpt output for slabs.
Anyway, we found the following message in the console log file (conman):
jf92o05 login: BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
IP: [<ffffffffa09bbdbd>] ost_rw_prolong_locks+0x18d/0x460 [ost]
PGD 0
Oops: 0000 [1] SMP
last sysfs file: /sys/kernel/uevent_seqnum
CPU 0
Modules linked in: obdfilter(N) fsfilt_ldiskfs(N) ost(N) mgc(N) ldiskfs(N) lustre(N) lov(N) mdc(N) lquota(N) osc(N) ko2iblnd(N) ptlrpc(N) obdclass(N) lnet(N) lvfs(N) libcfs(N) quota_v2(N) quot
a_tree(N) jbd2(N) crc16(N) edd(N) nfs(N) lockd(N) nfs_acl(N) sunrpc(N) rdma_ucm(N) ib_sdp(N) rdma_cm(N) iw_cm(N) ib_addr(N) ib_ipoib(N) ib_cm(N) ib_sa(N) ipv6(N) ib_uverbs(N) ib_umad(N) iw_nes
(N) libcrc32c(N) iw_cxgb3(N) cxgb3(N) ib_ipath(N) cpufreq_conservative(N) cpufreq_userspace(N) cpufreq_powersave(N) acpi_cpufreq(N) mlx4_ib(N) ib_mthca(N) ib_mad(N) ib_core(N) fuse(N) dm_crypt
(N) crypto_blkcipher(N) loop(N) dm_round_robin(N) dm_multipath(N) scsi_dh(N) sr_mod(N) cdrom(N) ide_pci_generic(N) jmicron(N) ide_core(N) ata_generic(N) snd_hda_intel(N) thermal(N) snd_pcm(N)
snd_timer(N) rtc_cmos(N) snd_page_alloc(N) ahci(N) processor(N) pata_jmicron(N) snd_hwdep(N) rtc_core(N) lpfc(N) libata(N) ses(N) thermal_sys(N) snd(N) rtc_lib(N) mlx4_core(N) pcspkr(N) i2c_i8
01(N) ohci1394(N) e1000e(N) serio_raw(N) enclosure(N) igb(N) soundcore(N) joydev(N) scsi_transport_fc(N) button(N) ieee1394(N) i2c_core(N) scsi_tgt(N) hwmon(N) dock(N) sg(N) linear(N) usbhid(N
) hid(N) ff_memless(N) uhci_hcd(N) ehci_hcd(N) sd_mod(N) crc_t10dif(N) usbcore(N) dm_snapshot(N) dm_mod(N) ext3(N) jbd(N) mbcache(N) aacraid(N) scsi_mod(N) [last unloaded: libcfs]
Supported: No
Pid: 24183, comm: ll_ost_io_71 Tainted: G 2.6.27.39-0.1_lustre.1.8.4-default #1
RIP: 0010:[<ffffffffa09bbdbd>] [<ffffffffa09bbdbd>] ost_rw_prolong_locks+0x18d/0x460 [ost]
RSP: 0018:ffff8805bbd3bd00 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff8805bbd3bd40
RDX: ffffffffa09bb480 RSI: ffff8805bbd3bd80 RDI: 0000000000000258
RBP: ffff8801d97c41b0 R08: 0000000000000006 R09: 0000000000000000
R10: ffff8805d0548c00 R11: ffff8805d9b5eb80 R12: 0000000000000006
R13: ffff8801d97c40c8 R14: ffff8802ba95dc00 R15: ffff8805bbd3bd40
FS: 00007fefa37f96f0(0000) GS:ffffffff80a33080(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000000000c8 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ll_ost_io_71 (pid: 24183, threadinfo ffff8805bbd3a000, task ffff8805bbd38100)
Stack: ffffffff80a23680 0000000000000000 ffff88062a43e7c0 ffffffffa07fd790
ffff8805bbd3be40 ffffffff80498e16 0000000000000000 ffffffffffffffff
ffff880815a27e00 00000000138da000 00000000138dafff 0000000000000000
Call Trace:
[<ffffffffa09bc1bb>] ost_rw_hpreq_check+0x12b/0x2b0 [ost]
[<ffffffffa076c9c3>] ptlrpc_main+0xef3/0x15f0 [ptlrpc]
[<ffffffff8020cf49>] child_rip+0xa/0x11
2 Some time later the node that took over the resources of the crashed node hangs, too.
Same situation in log files and resource allocation (no resource is exhausted); 'ldiskfs_inode_cache' slabs increase continuously before the server crashes (hangs), but allocation is not very high ( ~ 200 MB).
The same message appears in node's console log file, too:
-Separator ---- Sun Dec 11 20:10:01 CET 2011 ----
general protection fault: 0000 [1] SMP
last sysfs file: /sys/kernel/uevent_seqnum
CPU 0
Modules linked in: obdfilter(N) fsfilt_ldiskfs(N) ost(N) mgc(N) ldiskfs(N) lustre(N) lov(N) mdc(N) lquota(N) osc(N) ko2iblnd(N) ptlrpc(N) obdclass(N) lnet(N) lvfs(N) libcfs(N) quota_v2(N) quot
a_tree(N) jbd2(N) crc16(N) edd(N) nfs(N) lockd(N) nfs_acl(N) sunrpc(N) rdma_ucm(N) ib_sdp(N) rdma_cm(N) iw_cm(N) ib_addr(N) ib_ipoib(N) ib_cm(N) ib_sa(N) ipv6(N) ib_uverbs(N) ib_umad(N) iw_nes
(N) libcrc32c(N) iw_cxgb3(N) cxgb3(N) ib_ipath(N) cpufreq_conservative(N) cpufreq_userspace(N) cpufreq_powersave(N) acpi_cpufreq(N) mlx4_ib(N) ib_mthca(N) ib_mad(N) ib_core(N) fuse(N) dm_crypt
(N) crypto_blkcipher(N) loop(N) dm_round_robin(N) dm_multipath(N) scsi_dh(N) sr_mod(N) cdrom(N) ide_pci_generic(N) jmicron(N) ide_core(N) ata_generic(N) thermal(N) snd_hda_intel(N) snd_pcm(N)
processor(N) snd_timer(N) ahci(N) pata_jmicron(N) rtc_cmos(N) snd_page_alloc(N) ses(N) lpfc(N) thermal_sys(N) ohci1394(N) libata(N) rtc_core(N) snd_hwdep(N) scsi_transport_fc(N) mlx4_core(N) e
nclosure(N) hwmon(N) i2c_i801(N) dock(N) joydev(N) rtc_lib(N) button(N) pcspkr(N) ieee1394(N) snd(N) serio_raw(N) igb(N) scsi_tgt(N) e1000e(N) soundcore(N) i2c_core(N) sg(N) linear(N) usbhid(N
) hid(N) ff_memless(N) uhci_hcd(N) ehci_hcd(N) sd_mod(N) crc_t10dif(N) usbcore(N) dm_snapshot(N) dm_mod(N) ext3(N) jbd(N) mbcache(N) aacraid(N) scsi_mod(N) [last unloaded: libcfs]
Supported: No
Pid: 20502, comm: ll_ost_io_80 Tainted: G 2.6.27.39-0.1_lustre.1.8.4-default #1
RIP: 0010:[<ffffffffa075ce94>] [<ffffffffa075ce94>] lustre_msg_buf+0x4/0x90 [ptlrpc]
RSP: 0000:ffff8805cf82bdb0 EFLAGS: 00010282
RAX: 0000000000000008 RBX: ffff88026b76a808 RCX: aaaaaaaaaaaaaaab
RDX: 0000000000000018 RSI: 0000000000000002 RDI: 5a5a5a5a5a5a5a5a
RBP: 0000000000000001 R08: ffff8805f0dae900 R09: 0000000000000000
R10: 000000004ee5023d R11: ffff880c2d53edc0 R12: ffff88026b76a800
R13: 0000000000000001 R14: ffff88026b76a800 R15: ffff8803067bc608
FS: 00007f03bd6456f0(0000) GS:ffffffff80a33080(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000001ab9348 CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ll_ost_io_80 (pid: 20502, threadinfo ffff8805cf82a000, task ffff8805cf828800)
Stack: ffff88026b76a800 ffff88026b76a808 ffff8805f5c6c800 ffff88026b76a808
ffff8805f5c6c800 ffffffffa09b913b ffff88026b76a800 ffffffffa09bab0c
0000000000000000 ffff8803067bc540 ffff8805f5c6c800 ffff88026b76a800
Call Trace:
[<ffffffffa09b913b>] ost_rw_hpreq_check+0xab/0x2b0 [ost]
[<ffffffffa07699c3>] ptlrpc_main+0xef3/0x15f0 [ptlrpc]
[<ffffffff8020cf49>] child_rip+0xa/0x11
This time the system broken. After booting the second node manually the system is operational again.
The incident is 'restricted' to two server node pairs, and happens since 3 weeks periodically approximately after 7 days (every weekend, but that might be by chance).
Attachments
Issue Links
- Trackbacks
-
JIRA and Confluence Upgrade Plan Problem statement Currently our bug tracking system (JIRA) and wiki (Confluence) are running versions that are frighteningly outofdate, to the point that for our upgrade of Confluence,...