Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7517

oom killer active after failback of MDS resources

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • None
    • None
    • lola:
      build: tip of master + #31 of change 16383
    • 3
    • 9223372036854775807

    Description

      The error below happens during soak testing of change 16838 patch set #31 (no Wiki entry for build exits, yet) on cluster lola. DNE is enabled and MDSes are configured in active-active HA failover configuration.

      Primary resources of MDT lola-11 were failed back at Dec, 3 20:18.
      The allocation of slabs increased continuously till ~ 31 GB till crash
      MDS node lola-11 crashed with oom-killer at Dec, 4 00:21 (local time). (see also LU-7432)
      ptlrpc_cache seems to be the biggest consumer
      Attached lola-11's messages, console log, vmcore-dmesg file, collectl (version V4.0.2-1) files (for time interval specified above). Also
      attached files containing extracted counters for memory, slab totals and per slab allocation.

      Attachments

        1. console.log.bz2
          190 kB
        2. console-lola-10.log-20151213.gz
          567 kB
        3. console-lola-11.log.bz2
          120 kB
        4. console-lola-9.log-20151213.gz
          913 kB
        5. lola-10-memory-counter-20151213.dat.bz2
          21 kB
        6. lola-10-one-file-per-slab.tar.bz2
          506 kB
        7. lola-10-slab-detail-counter-20151213.dat.bz2
          721 kB
        8. lola-10-slab-global-counter-20151213.dat.bz2
          25 kB
        9. lola-11-memory-counter-20151213.dat.bz2
          60 kB
        10. lola-11-one-file-per-slab.tar.bz2
          1.21 MB
        11. lola-11-slab-detail-counter-20151213.dat.bz2
          1.97 MB
        12. lola-11-slab-global-counter-20151213.dat.bz2
          69 kB
        13. lola-9-memory-counter-20151213.dat.bz2
          38 kB
        14. lola-9-one-file-per-slab.tar.bz2
          813 kB
        15. lola-9-slab-details-counter-20151213.dat.bz2
          1.20 MB
        16. lola-9-slab-global-counter-20151213.dat.bz2
          44 kB
        17. memory-counter-lola-11.dat.bz2
          25 kB
        18. messages-lola-10.log-20151213.bz2
          774 kB
        19. messages-lola-11.log.bz2
          175 kB
        20. messages-lola-11.log.bz2
          302 kB
        21. messages-lola-9.log-20151213.bz2
          490 kB
        22. slab-details-lola-11.dat.bz2
          873 kB
        23. slab-details-one-file-per-slab.tar.bz2
          617 kB
        24. slab-total-lola-11.dat.bz2
          28 kB
        25. vmcore-dmesg.txt.bz2
          28 kB

        Activity

          People

            wc-triage WC Triage
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: