Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13212

Lustre client hangs machine under memory pressure

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.15.0
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

    Description

      Hello,

      When a userspace process goes crazy with memory allocation, sometimes OOM killer does not manage to kick in because Lustre is still trying to free its memory.
      I am not sure if it deadlocked or there is just too many locks which it is trying to free but it has been in this state for more than 12 hours before it was manually crashed.
      This is Centos 7.4 with kernel 3.10.0-693.5.2.el7.x86_64
      Machine still responds to pings when it is in this state.

      Here is one of the kernel task stack:

      [223483.032862]  [<ffffffff81196b27>] ? putback_inactive_pages+0x117/0x2d0
      [223483.050260]  [<ffffffff81196f0a>] ? shrink_inactive_list+0x22a/0x5d0
      [223483.062319]  [<ffffffff811979a5>] shrink_lruvec+0x385/0x730
      [223483.073571]  [<ffffffffc085ee07>] ? ldlm_cli_pool_shrink+0x67/0x100 [ptlrpc]
      [223483.086214]  [<ffffffff81197dc6>] shrink_zone+0x76/0x1a0
      [223483.096773]  [<ffffffff811982d0>] do_try_to_free_pages+0xf0/0x4e0
      [223483.108086]  [<ffffffff811987bc>] try_to_free_pages+0xfc/0x180
      [223483.119023]  [<ffffffff8169fbcb>] __alloc_pages_slowpath+0x457/0x724
      [223483.130417]  [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420
      [223483.141673]  [<ffffffff811d081a>] alloc_page_interleave+0x3a/0xa0
      [223483.152526]  [<ffffffff811d4133>] alloc_pages_vma+0x143/0x200
      [223483.162848]  [<ffffffff811c37a0>] ? end_swap_bio_write+0x80/0x80
      [223483.173345]  [<ffffffff811c44ad>] read_swap_cache_async+0xed/0x160
      [223483.183938]  [<ffffffff811c45c8>] swapin_readahead+0xa8/0x110
      [223483.193933]  [<ffffffff811b22cb>] handle_mm_fault+0xadb/0xfa0
      [223483.203823]  [<ffffffff816b00b4>] __do_page_fault+0x154/0x450
      [223483.213621]  [<ffffffff816b03e5>] do_page_fault+0x35/0x90
      [223483.222983]  [<ffffffff816ac608>] page_fault+0x28/0x30
      

      Please let me know if you need more information.
      Regards.
      Jacek Tomaka

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              Tomaka Jacek Tomaka (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: