Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7517

oom killer active after failback of MDS resources

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • None
    • None
    • lola:
      build: tip of master + #31 of change 16383
    • 3
    • 9223372036854775807

    Description

      The error below happens during soak testing of change 16838 patch set #31 (no Wiki entry for build exits, yet) on cluster lola. DNE is enabled and MDSes are configured in active-active HA failover configuration.

      Primary resources of MDT lola-11 were failed back at Dec, 3 20:18.
      The allocation of slabs increased continuously till ~ 31 GB till crash
      MDS node lola-11 crashed with oom-killer at Dec, 4 00:21 (local time). (see also LU-7432)
      ptlrpc_cache seems to be the biggest consumer
      Attached lola-11's messages, console log, vmcore-dmesg file, collectl (version V4.0.2-1) files (for time interval specified above). Also
      attached files containing extracted counters for memory, slab totals and per slab allocation.

      Attachments

        1. console.log.bz2
          190 kB
          Frank Heckes
        2. console-lola-10.log-20151213.gz
          567 kB
          Frank Heckes
        3. console-lola-11.log.bz2
          120 kB
          Frank Heckes
        4. console-lola-9.log-20151213.gz
          913 kB
          Frank Heckes
        5. lola-10-memory-counter-20151213.dat.bz2
          21 kB
          Frank Heckes
        6. lola-10-one-file-per-slab.tar.bz2
          506 kB
          Frank Heckes
        7. lola-10-slab-detail-counter-20151213.dat.bz2
          721 kB
          Frank Heckes
        8. lola-10-slab-global-counter-20151213.dat.bz2
          25 kB
          Frank Heckes
        9. lola-11-memory-counter-20151213.dat.bz2
          60 kB
          Frank Heckes
        10. lola-11-one-file-per-slab.tar.bz2
          1.21 MB
          Frank Heckes
        11. lola-11-slab-detail-counter-20151213.dat.bz2
          1.97 MB
          Frank Heckes
        12. lola-11-slab-global-counter-20151213.dat.bz2
          69 kB
          Frank Heckes
        13. lola-9-memory-counter-20151213.dat.bz2
          38 kB
          Frank Heckes
        14. lola-9-one-file-per-slab.tar.bz2
          813 kB
          Frank Heckes
        15. lola-9-slab-details-counter-20151213.dat.bz2
          1.20 MB
          Frank Heckes
        16. lola-9-slab-global-counter-20151213.dat.bz2
          44 kB
          Frank Heckes
        17. memory-counter-lola-11.dat.bz2
          25 kB
          Frank Heckes
        18. messages-lola-10.log-20151213.bz2
          774 kB
          Frank Heckes
        19. messages-lola-11.log.bz2
          175 kB
          Frank Heckes
        20. messages-lola-11.log.bz2
          302 kB
          Frank Heckes
        21. messages-lola-9.log-20151213.bz2
          490 kB
          Frank Heckes
        22. slab-details-lola-11.dat.bz2
          873 kB
          Frank Heckes
        23. slab-details-one-file-per-slab.tar.bz2
          617 kB
          Frank Heckes
        24. slab-total-lola-11.dat.bz2
          28 kB
          Frank Heckes
        25. vmcore-dmesg.txt.bz2
          28 kB
          Frank Heckes

        Activity

          People

            wc-triage WC Triage
            heckes Frank Heckes (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: