Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7432

oom-killer started on MDSes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • Lustre 2.8.0
    • None
    • lola
      build: tip of master(df6cf859bbb29392064e6ddb701f3357e01b3a13) + patches
    • 3
    • 9223372036854775807

    Description

      The error occurred during soak testing of build '20151113' (see https://wiki.hpdd.intel.com/pages/viewpage.action?title=Soak+Testing+on+Lola&spaceKey=Releases#SoakTestingonLola-20151113) and earlier already when testing build '20151109'.
      DNE is enabled. OSTs had been formatted using zfs, MDTs using ldiskfs. MDS nodes are configured in HA active-active failover configuration.

      At three moments in time:

      date node build ID soak event  
      Nov 9 18:10:01 lola-9 build: 20151109 no fault; only job execution
      Nov 13 14:30:02 lola-10 build 20151113 during stopping of soak
      Nov 14 05:35:01 lola-11 build 20151113 no fault ; only job execution
      Nov 14 05:45:01 {{ lola-9}} build 20151113 no fault ; only job execution

      the oom - killer had been invoked on the nodes specified. (All events happened at times where no fault was injected.)

      Attached files: console and syslog of nodes affected.

      Unfortunately collectl wasn't running to gather performance counters.
      The tool has been enabled on all soak nodes to be able get memory, especially slab stats during one of the next sessions.

      Attachments

        1. messages-lola-9.log.bz2
          659 kB
        2. messages-lola-11.log.bz2
          805 kB
        3. messages-lola-10.log.bz2
          790 kB
        4. console-lola-9.log.gz
          880 kB
        5. console-lola-11.log.gz
          619 kB
        6. console-lola-10.log.gz
          405 kB

        Issue Links

          Activity

            People

              wc-triage WC Triage
              heckes Frank Heckes (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: