Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13212

Lustre client hangs machine under memory pressure

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major Major
    • Lustre 2.15.0
    • Lustre 2.10.3
    • None
    • 3
    • 9223372036854775807

      Hello,

      When a userspace process goes crazy with memory allocation, sometimes OOM killer does not manage to kick in because Lustre is still trying to free its memory.
      I am not sure if it deadlocked or there is just too many locks which it is trying to free but it has been in this state for more than 12 hours before it was manually crashed.
      This is Centos 7.4 with kernel 3.10.0-693.5.2.el7.x86_64
      Machine still responds to pings when it is in this state.

      Here is one of the kernel task stack:

      [223483.032862]  [<ffffffff81196b27>] ? putback_inactive_pages+0x117/0x2d0
      [223483.050260]  [<ffffffff81196f0a>] ? shrink_inactive_list+0x22a/0x5d0
      [223483.062319]  [<ffffffff811979a5>] shrink_lruvec+0x385/0x730
      [223483.073571]  [<ffffffffc085ee07>] ? ldlm_cli_pool_shrink+0x67/0x100 [ptlrpc]
      [223483.086214]  [<ffffffff81197dc6>] shrink_zone+0x76/0x1a0
      [223483.096773]  [<ffffffff811982d0>] do_try_to_free_pages+0xf0/0x4e0
      [223483.108086]  [<ffffffff811987bc>] try_to_free_pages+0xfc/0x180
      [223483.119023]  [<ffffffff8169fbcb>] __alloc_pages_slowpath+0x457/0x724
      [223483.130417]  [<ffffffff8118cdb5>] __alloc_pages_nodemask+0x405/0x420
      [223483.141673]  [<ffffffff811d081a>] alloc_page_interleave+0x3a/0xa0
      [223483.152526]  [<ffffffff811d4133>] alloc_pages_vma+0x143/0x200
      [223483.162848]  [<ffffffff811c37a0>] ? end_swap_bio_write+0x80/0x80
      [223483.173345]  [<ffffffff811c44ad>] read_swap_cache_async+0xed/0x160
      [223483.183938]  [<ffffffff811c45c8>] swapin_readahead+0xa8/0x110
      [223483.193933]  [<ffffffff811b22cb>] handle_mm_fault+0xadb/0xfa0
      [223483.203823]  [<ffffffff816b00b4>] __do_page_fault+0x154/0x450
      [223483.213621]  [<ffffffff816b03e5>] do_page_fault+0x35/0x90
      [223483.222983]  [<ffffffff816ac608>] page_fault+0x28/0x30
      

      Please let me know if you need more information.
      Regards.
      Jacek Tomaka

            adilger Andreas Dilger
            Tomaka Jacek Tomaka (Inactive)
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: