Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-912

OSS node(s) crash with Kernel oops

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • Lustre 1.8.6
    • Lustre 1.8.x (1.8.0 - 1.8.5)
    • None
    • 3
    • 21,804
    • 6507

    Description

      Sorry if this is a duplicate, but I couldn't find a similar bug.

      Failure is restricted to OSS nodes and occurs as follows:

      1 One OSS node crash. Heartbeat manage to takeover the resources towards the standy node smoothly.

      There's no indication of any IB errors in the opensm.log; No Error in /var/log/messages and /var/log/warn. No resource (CPU, Memory, network, Disk) is exhausted (I can provide the collectl files if needed). One thing that might be noticed is that the 'ldiskfs_inode_cache' increase constantly over 1GB till the nodes crashes (numslabs, object, size). See attached collectl excerpt output for slabs.

      Anyway, we found the following message in the console log file (conman):

      jf92o05 login: BUG: unable to handle kernel NULL pointer dereference at 00000000000000c8
      IP: [<ffffffffa09bbdbd>] ost_rw_prolong_locks+0x18d/0x460 [ost]
      PGD 0
      Oops: 0000 [1] SMP
      last sysfs file: /sys/kernel/uevent_seqnum
      CPU 0
      Modules linked in: obdfilter(N) fsfilt_ldiskfs(N) ost(N) mgc(N) ldiskfs(N) lustre(N) lov(N) mdc(N) lquota(N) osc(N) ko2iblnd(N) ptlrpc(N) obdclass(N) lnet(N) lvfs(N) libcfs(N) quota_v2(N) quot
      a_tree(N) jbd2(N) crc16(N) edd(N) nfs(N) lockd(N) nfs_acl(N) sunrpc(N) rdma_ucm(N) ib_sdp(N) rdma_cm(N) iw_cm(N) ib_addr(N) ib_ipoib(N) ib_cm(N) ib_sa(N) ipv6(N) ib_uverbs(N) ib_umad(N) iw_nes
      (N) libcrc32c(N) iw_cxgb3(N) cxgb3(N) ib_ipath(N) cpufreq_conservative(N) cpufreq_userspace(N) cpufreq_powersave(N) acpi_cpufreq(N) mlx4_ib(N) ib_mthca(N) ib_mad(N) ib_core(N) fuse(N) dm_crypt
      (N) crypto_blkcipher(N) loop(N) dm_round_robin(N) dm_multipath(N) scsi_dh(N) sr_mod(N) cdrom(N) ide_pci_generic(N) jmicron(N) ide_core(N) ata_generic(N) snd_hda_intel(N) thermal(N) snd_pcm(N)
      snd_timer(N) rtc_cmos(N) snd_page_alloc(N) ahci(N) processor(N) pata_jmicron(N) snd_hwdep(N) rtc_core(N) lpfc(N) libata(N) ses(N) thermal_sys(N) snd(N) rtc_lib(N) mlx4_core(N) pcspkr(N) i2c_i8
      01(N) ohci1394(N) e1000e(N) serio_raw(N) enclosure(N) igb(N) soundcore(N) joydev(N) scsi_transport_fc(N) button(N) ieee1394(N) i2c_core(N) scsi_tgt(N) hwmon(N) dock(N) sg(N) linear(N) usbhid(N
      ) hid(N) ff_memless(N) uhci_hcd(N) ehci_hcd(N) sd_mod(N) crc_t10dif(N) usbcore(N) dm_snapshot(N) dm_mod(N) ext3(N) jbd(N) mbcache(N) aacraid(N) scsi_mod(N) [last unloaded: libcfs]
      Supported: No
      Pid: 24183, comm: ll_ost_io_71 Tainted: G 2.6.27.39-0.1_lustre.1.8.4-default #1
      RIP: 0010:[<ffffffffa09bbdbd>] [<ffffffffa09bbdbd>] ost_rw_prolong_locks+0x18d/0x460 [ost]
      RSP: 0018:ffff8805bbd3bd00 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff8805bbd3bd40
      RDX: ffffffffa09bb480 RSI: ffff8805bbd3bd80 RDI: 0000000000000258
      RBP: ffff8801d97c41b0 R08: 0000000000000006 R09: 0000000000000000
      R10: ffff8805d0548c00 R11: ffff8805d9b5eb80 R12: 0000000000000006
      R13: ffff8801d97c40c8 R14: ffff8802ba95dc00 R15: ffff8805bbd3bd40
      FS: 00007fefa37f96f0(0000) GS:ffffffff80a33080(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 00000000000000c8 CR3: 0000000000201000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process ll_ost_io_71 (pid: 24183, threadinfo ffff8805bbd3a000, task ffff8805bbd38100)
      Stack: ffffffff80a23680 0000000000000000 ffff88062a43e7c0 ffffffffa07fd790
      ffff8805bbd3be40 ffffffff80498e16 0000000000000000 ffffffffffffffff
      ffff880815a27e00 00000000138da000 00000000138dafff 0000000000000000
      Call Trace:
      [<ffffffffa09bc1bb>] ost_rw_hpreq_check+0x12b/0x2b0 [ost]
      [<ffffffffa076c9c3>] ptlrpc_main+0xef3/0x15f0 [ptlrpc]
      [<ffffffff8020cf49>] child_rip+0xa/0x11

      2 Some time later the node that took over the resources of the crashed node hangs, too.

      Same situation in log files and resource allocation (no resource is exhausted); 'ldiskfs_inode_cache' slabs increase continuously before the server crashes (hangs), but allocation is not very high ( ~ 200 MB).

      The same message appears in node's console log file, too:

      -Separator ---- Sun Dec 11 20:10:01 CET 2011 ----
      general protection fault: 0000 [1] SMP
      last sysfs file: /sys/kernel/uevent_seqnum
      CPU 0
      Modules linked in: obdfilter(N) fsfilt_ldiskfs(N) ost(N) mgc(N) ldiskfs(N) lustre(N) lov(N) mdc(N) lquota(N) osc(N) ko2iblnd(N) ptlrpc(N) obdclass(N) lnet(N) lvfs(N) libcfs(N) quota_v2(N) quot
      a_tree(N) jbd2(N) crc16(N) edd(N) nfs(N) lockd(N) nfs_acl(N) sunrpc(N) rdma_ucm(N) ib_sdp(N) rdma_cm(N) iw_cm(N) ib_addr(N) ib_ipoib(N) ib_cm(N) ib_sa(N) ipv6(N) ib_uverbs(N) ib_umad(N) iw_nes
      (N) libcrc32c(N) iw_cxgb3(N) cxgb3(N) ib_ipath(N) cpufreq_conservative(N) cpufreq_userspace(N) cpufreq_powersave(N) acpi_cpufreq(N) mlx4_ib(N) ib_mthca(N) ib_mad(N) ib_core(N) fuse(N) dm_crypt
      (N) crypto_blkcipher(N) loop(N) dm_round_robin(N) dm_multipath(N) scsi_dh(N) sr_mod(N) cdrom(N) ide_pci_generic(N) jmicron(N) ide_core(N) ata_generic(N) thermal(N) snd_hda_intel(N) snd_pcm(N)
      processor(N) snd_timer(N) ahci(N) pata_jmicron(N) rtc_cmos(N) snd_page_alloc(N) ses(N) lpfc(N) thermal_sys(N) ohci1394(N) libata(N) rtc_core(N) snd_hwdep(N) scsi_transport_fc(N) mlx4_core(N) e
      nclosure(N) hwmon(N) i2c_i801(N) dock(N) joydev(N) rtc_lib(N) button(N) pcspkr(N) ieee1394(N) snd(N) serio_raw(N) igb(N) scsi_tgt(N) e1000e(N) soundcore(N) i2c_core(N) sg(N) linear(N) usbhid(N
      ) hid(N) ff_memless(N) uhci_hcd(N) ehci_hcd(N) sd_mod(N) crc_t10dif(N) usbcore(N) dm_snapshot(N) dm_mod(N) ext3(N) jbd(N) mbcache(N) aacraid(N) scsi_mod(N) [last unloaded: libcfs]
      Supported: No
      Pid: 20502, comm: ll_ost_io_80 Tainted: G 2.6.27.39-0.1_lustre.1.8.4-default #1
      RIP: 0010:[<ffffffffa075ce94>] [<ffffffffa075ce94>] lustre_msg_buf+0x4/0x90 [ptlrpc]
      RSP: 0000:ffff8805cf82bdb0 EFLAGS: 00010282
      RAX: 0000000000000008 RBX: ffff88026b76a808 RCX: aaaaaaaaaaaaaaab
      RDX: 0000000000000018 RSI: 0000000000000002 RDI: 5a5a5a5a5a5a5a5a
      RBP: 0000000000000001 R08: ffff8805f0dae900 R09: 0000000000000000
      R10: 000000004ee5023d R11: ffff880c2d53edc0 R12: ffff88026b76a800
      R13: 0000000000000001 R14: ffff88026b76a800 R15: ffff8803067bc608
      FS: 00007f03bd6456f0(0000) GS:ffffffff80a33080(0000) knlGS:0000000000000000
      CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000001ab9348 CR3: 0000000000201000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process ll_ost_io_80 (pid: 20502, threadinfo ffff8805cf82a000, task ffff8805cf828800)
      Stack: ffff88026b76a800 ffff88026b76a808 ffff8805f5c6c800 ffff88026b76a808
      ffff8805f5c6c800 ffffffffa09b913b ffff88026b76a800 ffffffffa09bab0c
      0000000000000000 ffff8803067bc540 ffff8805f5c6c800 ffff88026b76a800
      Call Trace:
      [<ffffffffa09b913b>] ost_rw_hpreq_check+0xab/0x2b0 [ost]
      [<ffffffffa07699c3>] ptlrpc_main+0xef3/0x15f0 [ptlrpc]
      [<ffffffff8020cf49>] child_rip+0xa/0x11

      This time the system broken. After booting the second node manually the system is operational again.

      The incident is 'restricted' to two server node pairs, and happens since 3 weeks periodically approximately after 7 days (every weekend, but that might be by chance).

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: