Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17924

obdfilter-survey test_1a: OSS hit OOM

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.16.0, Lustre 2.15.5, Lustre 2.15.6
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/7738180d-8e27-44a0-b3fd-f3ee5a5d3c83

      test_1a failed with the following error:

      trevis-57vm3 crashed during obdfilter-survey test_1a
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-b2_15/87 - 4.18.0-477.10.1.el8_8.x86_64
      servers: https://build.whamcloud.com/job/lustre-b2_15/87 - 4.18.0-477.27.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>
      This is for zfs, and not sure if a dup of LU-12830

      [ 7965.202924] obd_memory max: 127269938, obd_memory current: 120046762
      [ 7965.204283] NetworkManager invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
      [ 7965.206139] CPU: 1 PID: 599 Comm: NetworkManager Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-477.27.1.el8_lustre.x86_64 #1
      [ 7965.208386] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [ 7965.209429] Call Trace:
      [ 7965.209980]  dump_stack+0x41/0x60
      [ 7965.210675]  dump_header+0x4a/0x1df
      [ 7965.211385]  out_of_memory.cold.36+0xa/0x7e
      [ 7965.212177]  __alloc_pages_slowpath+0xbe7/0xcd0
      [ 7965.213076]  ? blk_mq_sched_insert_requests+0x6c/0xf0
      [ 7965.214024]  __alloc_pages_nodemask+0x2e2/0x330
      [ 7965.214868]  alloc_pages_vma+0x74/0x1d0
      [ 7965.215622]  __read_swap_cache_async+0xf4/0x2b0
      [ 7965.216464]  swap_cluster_readahead+0x178/0x2f0
      [ 7965.217314]  ? __mod_lruvec_page_state+0x5e/0x80
      [ 7965.218185]  swapin_readahead+0x5c/0x501
      [ 7965.218933]  ? pagecache_get_page+0x30/0x310
      [ 7965.219738]  do_swap_page+0x45b/0x710
      [ 7965.220449]  ? pmd_devmap_trans_unstable+0x2e/0x40
      [ 7965.221339]  ? handle_pte_fault+0x5d/0x880
      [ 7965.222112]  __handle_mm_fault+0x453/0x6c0
      [ 7965.222884]  handle_mm_fault+0xca/0x2a0
      [ 7965.223604]  __do_page_fault+0x1f0/0x460
      [ 7965.224376]  do_page_fault+0x37/0x130
      [ 7965.225084]  ? page_fault+0x8/0x30
      [ 7965.225746]  page_fault+0x1e/0x30
      [ 7965.226391] RIP: 0033:0x7f5119b43e51
      [ 7965.227100] Code: Unable to access opcode bytes at RIP 0x7f5119b43e27.
      [ 7965.228265] RSP: 002b:00007ffd97e59d58 EFLAGS: 00010202
      [ 7965.229232] RAX: 000055a92eae7e70 RBX: 000055a92eb012b0 RCX: 0000000000000000
      [ 7965.230520] RDX: 000055a92eb012b0 RSI: 0000000000000050 RDI: 000055a92eb012b0
      [ 7965.231807] RBP: 00007ffd97e59de0 R08: 0000000000000000 R09: 000055a92ea4ca60
      [ 7965.233123] R10: 0000000000000033 R11: 0000000000000000 R12: 00007f5119909280
      [ 7965.234515] R13: 0000000000000000 R14: 00007f51196852f0 R15: 000055a92eb2b110
      [ 7965.235824] Mem-Info:
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      obdfilter-survey test_1a - trevis-57vm3 crashed during obdfilter-survey test_1a

      Attachments

        Issue Links

          Activity

            [LU-17924] obdfilter-survey test_1a: OSS hit OOM

            simmonsja that is mostly a ZFS question. Possibly there is some L2ARC tunable parameter that could be used? Possibly it could be set persistently in the zpool at format time (based on the node RAM) instead of being set each time on mount, but I think both are possible to add to test-framework.sh.

            adilger Andreas Dilger added a comment - simmonsja that is mostly a ZFS question. Possibly there is some L2ARC tunable parameter that could be used? Possibly it could be set persistently in the zpool at format time (based on the node RAM) instead of being set each time on mount, but I think both are possible to add to test-framework.sh.

            Is there anyway to limit the amount of memory ZFS is allocating or do we need to update the VM image?

            simmonsja James A Simmons added a comment - Is there anyway to limit the amount of memory ZFS is allocating or do we need to update the VM image?
            yujian Jian Yu added a comment - Lustre 2.15.6 RC1: https://testing.whamcloud.com/test_sets/0086cbe0-1fc1-4d01-be21-c5b308be676f
            yujian Jian Yu added a comment - Also failed on Lustre b2_15 branch: https://testing.whamcloud.com/test_sets/b681679f-74c7-45c8-943e-1cbf23d226c2
            yujian Jian Yu added a comment - Lustre 2.16.0 RC5: https://testing.whamcloud.com/test_sets/cd12829e-9139-4467-bdd4-3bf516b1f847
            yujian Jian Yu added a comment - +1 on master branch: https://testing.whamcloud.com/test_sets/d03c45b1-3b78-47c1-823b-a0202d38f7ca

            Most of these failures are for ZFS OSTs. There are a few similar OOM failures with ldiskfs servers, but they are all running 2.14.0 on the servers, so I think this is a ZFS memory management issue.

            adilger Andreas Dilger added a comment - Most of these failures are for ZFS OSTs. There are a few similar OOM failures with ldiskfs servers, but they are all running 2.14.0 on the servers, so I think this is a ZFS memory management issue.
            sarah Sarah Liu added a comment - another OOM on OSS of zfs https://testing.whamcloud.com/test_sets/6b69954d-7cfc-4e32-aa04-fd9b63a5ca3f

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: