Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12864

sanity-benchmark test_iozone crashes with OOM on clients

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.13.0, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.6, Lustre 2.12.8, Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      sanity-benchmark test_iozone crashes with OOM. This crash has been seen in ARM and x86_64 client testing a total of eight times. The first occurrence was 30 JULY 2019 for Lustre 2.12.2.101 and 09 AUG 2019 for Lustre 2.12.56.87.

      Looking at the kernel crash for https://testing.whamcloud.com/test_sets/93a9c704-eb70-11e9-b62b-52540065bddc, we see ext4_filemap_fault in the call stack which seems unique to other OOM crashes we’ve seen

      [23529.881894] Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone ============================================================== 22:48:27 (1570574907)
      [23532.537981] Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1785584kB available, using 3074176kB file size
      [23533.130811] Lustre: DEBUG MARKER: min OST has 1785584kB available, using 3074176kB file size
      [23702.824787] in:imjournal invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null),  order=0, oom_score_adj=0
      [23702.841803] in:imjournal cpuset=/ mems_allowed=0
      [23702.844436] CPU: 0 PID: 937 Comm: in:imjournal Kdump: loaded Tainted: G           OE  ------------   4.14.0-115.2.2.el7a.aarch64 #1
      [23702.851192] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [23702.855331] Call trace:
      [23702.856924] [<ffff000008089e14>] dump_backtrace+0x0/0x23c
      [23702.860189] [<ffff00000808a074>] show_stack+0x24/0x2c
      [23702.863232] [<ffff000008855c28>] dump_stack+0x84/0xa8
      [23702.866282] [<ffff000008211fc0>] dump_header+0x94/0x1ec
      [23702.869476] [<ffff000008211e4c>] out_of_memory+0x430/0x484
      [23702.872747] [<ffff0000082179c4>] __alloc_pages_nodemask+0xa78/0xec0
      [23702.876522] [<ffff00000827a89c>] alloc_pages_current+0x8c/0xd8
      [23702.880039] [<ffff000008209eb8>] __page_cache_alloc+0x9c/0xd8
      [23702.883499] [<ffff00000820dc40>] filemap_fault+0x340/0x550
      [23702.887580] [<ffff000001405608>] ext4_filemap_fault+0x38/0x54 [ext4]
      [23702.891420] [<ffff00000824b364>] __do_fault+0x30/0xf4
      [23702.894459] [<ffff000008250130>] do_fault+0x3ec/0x4b8
      [23702.897517] [<ffff00000825178c>] __handle_mm_fault+0x3f4/0x560
      [23702.900998] [<ffff0000082519d8>] handle_mm_fault+0xe0/0x178
      [23702.904324] [<ffff000008872dc4>] do_page_fault+0x1c4/0x3cc
      [23702.907608] [<ffff00000887301c>] do_translation_fault+0x50/0x5c
      [23702.911152] [<ffff0000080813e8>] do_mem_abort+0x64/0xe4
      [23702.914390] [<ffff000008081568>] do_el0_ia_bp_hardening+0x94/0xb4
      [23702.918206] Exception stack(0xffff00000be2fec0 to 0xffff00000be30000)
      [23702.922205] fec0: 0000000000000000 0000000000000000 0000000000000000 0000ffff9768e6a0
      [23702.927072] fee0: 0000000000000002 0000000000000000 00000000ffffffbb 0000000000000000
      [23702.931975] ff00: 0000000000000049 003b9aca00000000 0000000000005c93 0000000028da3176
      [23702.936883] ff20: 0000000000000018 000000005d9d12e6 001d34ce80000000 0000a26c46000000
      [23702.941748] ff40: 0000ffff987ffae0 0000ffff98974ef0 0000000000000012 0000ffff987ff000
      [23702.946622] ff60: 00000000000dbba0 0000ffff987ff000 0000ffff900be4d0 0000ffff98830000
      [23702.951483] ff80: 000000000000b712 0000ffff900acef0 0000ffff9768e8a0 0000ffff98830000
      [23702.956372] ffa0: 0000000000000000 0000ffff9768e700 0000ffff987ca4e0 0000ffff9768e700
      [23702.961255] ffc0: 0000ffff987ca4e0 0000000080000000 0000ffff9768e720 00000000ffffffff
      [23702.966136] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [23702.971019] [<ffff0000080832a4>] el0_ia+0x1c/0x20
      [23702.974052] Mem-Info:
      [23702.975375] active_anon:0 inactive_anon:0 isolated_anon:0
       active_file:3693 inactive_file:15860 isolated_file:64
       unevictable:0 dirty:256 writeback:3416 unstable:0
       slab_reclaimable:351 slab_unreclaimable:1716
       mapped:4 shmem:0 pagetables:145 bounce:0
       free:1170 free_pcp:4 free_cma:0
      [23702.994215] Node 0 active_anon:0kB inactive_anon:0kB active_file:236352kB inactive_file:1014336kB unevictable:0kB isolated(anon):0kB isolated(file):4096kB mapped:256kB dirty:16384kB writeback:218624kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
      [23703.010485] Node 0 DMA free:74880kB min:75328kB low:94144kB high:112960kB active_anon:0kB inactive_anon:0kB active_file:236352kB inactive_file:1012992kB unevictable:0kB writepending:235008kB present:2097152kB managed:1537088kB mlocked:0kB kernel_stack:10624kB pagetables:9280kB bounce:0kB free_pcp:256kB local_pcp:128kB free_cma:0kB
      [23703.028015] lowmem_reserve[]: 0 0 0
      [23703.030000] Node 0 DMA: 92*64kB (U) 5*128kB (U) 1*256kB (U) 3*512kB (U) 1*1024kB (U) 0*2048kB 0*4096kB 2*8192kB (U) 1*16384kB (U) 1*32768kB (U) 0*65536kB 0*131072kB 0*262144kB 0*524288kB = 74880kB
      [23703.040433] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [23703.045443] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB
      [23703.050373] 3700 total pagecache pages
      [23703.052649] 0 pages in swap cache
      [23703.054534] Swap cache stats: add 4640, delete 4640, find 319/523
      [23703.057997] Free swap  = 1826560kB
      [23703.059928] Total swap = 2098112kB
      [23703.061952] 32768 pages RAM
      [23703.063537] 0 pages HighMem/MovableOnly
      [23703.065741] 8751 pages reserved
      [23703.067542] 0 pages hwpoisoned
      [23703.069277] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
      [23703.074369] [  417]     0   417      237        0       3       2       39             0 systemd-journal
      [23703.079784] [  439]     0   439     1282        0       4       2       43             0 lvmetad
      [23703.085004] [  453]     0   453      243        2       4       2       41         -1000 systemd-udevd
      [23703.090284] [  541]     0   541      267        0       4       2       48         -1000 auditd
      [23703.095514] [  588]    81   588      160        2       3       2       59          -900 dbus-daemon
      [23703.100750] [  589]    32   589      185        0       4       2       75             0 rpcbind
      [23703.105767] [  592]     0   592     2458        0       4       2       55             0 gssproxy
      [23703.110859] [  600]     0   600     6767        0       5       2      172             0 NetworkManager
      [23703.116188] [  601]   999   601     8492        0       6       3      145             0 polkitd
      [23703.121264] [  602]     0   602       88        0       3       2       31             0 systemd-logind
      [23703.126603] [  603]     0   603       92        2       3       2       24             0 irqbalance
      [23703.131843] [  610]    38   610      160        2       4       2       47             0 ntpd
      [23703.136684] [  683]     0   683      359        2       3       2      113             0 dhclient
      [23703.141833] [  917]     0   917      323        2       4       2      104         -1000 sshd
      [23703.146688] [  921]     0   921     6888        1       5       2      318             0 tuned
      [23703.151673] [  923]     0   923       91        2       3       2       29             0 xinetd
      [23703.156625] [  924]     0   924     3762        1       4       2       99             0 rsyslogd
      [23703.161769] [  930]   997   930     3200        0       3       2       51             0 munged
      [23703.166723] [  938]    29   938      130        2       3       2       51             0 rpc.statd
      [23703.171907] [  980]     0   980     9711        0       5       2      156             0 automount
      [23703.177064] [  986]     0   986     1756        0       4       2       39             0 crond
      [23703.182076] [  988]     0   988       78        0       4       2       29             0 atd
      [23703.186875] [ 1003]     0  1003     1718        2       3       2       10             0 agetty
      [23703.191913] [ 1005]     0  1005     1718        2       3       2       10             0 agetty
      [23703.196881] [ 1521]     0  1521      344        0       4       2       85             0 master
      [23703.201935] [ 1565]    89  1565      347        2       4       2       81             0 qmgr
      [23703.206802] [ 9394]     0  9394      392        0       4       2      149             0 sshd
      [23703.211760] [ 9396]     0  9396     1739        0       3       2       14             0 run_test.sh
      [23703.216936] [ 9702]     0  9702     1788        2       3       2       63             0 bash
      [23703.221846] [22149]    89 22149      346        0       4       2       81             0 pickup
      [23703.226825] [26394]     0 26394     1788        1       3       2       63             0 bash
      [23703.231749] [26395]     0 26395     1715        1       4       2        8             0 tee
      [23703.236525] [26592]     0 26592     1785        2       3       2       61             0 bash
      [23703.241464] [31996]     0 31996     1788        1       3       2       63             0 bash
      [23703.246327] [31997]     0 31997     1715        1       4       2        9             0 tee
      [23703.251181] [32459]   500 32459      685        0       4       2      284             0 iozone
      [23703.256109] [32460]     0 32460     1715        1       4       2        8             0 tee
      [23703.260934] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
      
      [23703.266467] CPU: 0 PID: 937 Comm: in:imjournal Kdump: loaded Tainted: G           OE  ------------   4.14.0-115.2.2.el7a.aarch64 #1
      [23703.273347] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
      [23703.277474] Call trace:
      [23703.278946] [<ffff000008089e14>] dump_backtrace+0x0/0x23c
      [23703.282220] [<ffff00000808a074>] show_stack+0x24/0x2c
      [23703.285224] [<ffff000008855c28>] dump_stack+0x84/0xa8
      [23703.288434] [<ffff0000080d4e5c>] panic+0x138/0x2a0
      [23703.291316] [<ffff000008211e70>] out_of_memory+0x454/0x484
      [23703.294619] [<ffff0000082179c4>] __alloc_pages_nodemask+0xa78/0xec0
      [23703.298371] [<ffff00000827a89c>] alloc_pages_current+0x8c/0xd8
      [23703.301874] [<ffff000008209eb8>] __page_cache_alloc+0x9c/0xd8
      [23703.305324] [<ffff00000820dc40>] filemap_fault+0x340/0x550
      [23703.308897] [<ffff000001405608>] ext4_filemap_fault+0x38/0x54 [ext4]
      [23703.312710] [<ffff00000824b364>] __do_fault+0x30/0xf4
      [23703.315715] [<ffff000008250130>] do_fault+0x3ec/0x4b8
      [23703.318783] [<ffff00000825178c>] __handle_mm_fault+0x3f4/0x560
      [23703.322271] [<ffff0000082519d8>] handle_mm_fault+0xe0/0x178
      [23703.325625] [<ffff000008872dc4>] do_page_fault+0x1c4/0x3cc
      [23703.328906] [<ffff00000887301c>] do_translation_fault+0x50/0x5c
      [23703.332421] [<ffff0000080813e8>] do_mem_abort+0x64/0xe4
      [23703.335530] [<ffff000008081568>] do_el0_ia_bp_hardening+0x94/0xb4
      [23703.339191] Exception stack(0xffff00000be2fec0 to 0xffff00000be30000)
      [23703.343081] fec0: 0000000000000000 0000000000000000 0000000000000000 0000ffff9768e6a0
      [23703.347762] fee0: 0000000000000002 0000000000000000 00000000ffffffbb 0000000000000000
      [23703.352444] ff00: 0000000000000049 003b9aca00000000 0000000000005c93 0000000028da3176
      [23703.357175] ff20: 0000000000000018 000000005d9d12e6 001d34ce80000000 0000a26c46000000
      [23703.361853] ff40: 0000ffff987ffae0 0000ffff98974ef0 0000000000000012 0000ffff987ff000
      [23703.366551] ff60: 00000000000dbba0 0000ffff987ff000 0000ffff900be4d0 0000ffff98830000
      [23703.371220] ff80: 000000000000b712 0000ffff900acef0 0000ffff9768e8a0 0000ffff98830000
      [23703.375942] ffa0: 0000000000000000 0000ffff9768e700 0000ffff987ca4e0 0000ffff9768e700
      [23703.380639] ffc0: 0000ffff987ca4e0 0000000080000000 0000ffff9768e720 00000000ffffffff
      [23703.385333] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      [23703.390039] [<ffff0000080832a4>] el0_ia+0x1c/0x20
      [23703.392919] SMP: stopping secondary CPUs
      [23703.398529] Starting crashdump kernel...
      [23703.400804] Bye!
      

      Logs for other crashes are at
      https://testing.whamcloud.com/test_sets/a2546caa-d315-11e9-9fc9-52540065bddc
      https://testing.whamcloud.com/test_sets/6bc5196c-bb4d-11e9-a25b-52540065bddc
      https://testing.whamcloud.com/test_sets/d4d9a03c-c046-11e9-97d5-52540065bddc
      https://testing.whamcloud.com/test_sets/eecba4da-e577-11e9-a197-52540065bddc

      Attachments

        Issue Links

          Activity

            [LU-12864] sanity-benchmark test_iozone crashes with OOM on clients
            yujian Jian Yu added a comment - Lustre 2.16.0 RC5: https://testing.whamcloud.com/test_sets/ad5b47e4-70e4-4863-b12b-87f12de3136d
            yujian Jian Yu added a comment - +1 on master branch: https://testing.whamcloud.com/test_sets/38d0b472-8ab4-4c29-98bb-efa8eb8c3e10
            sgiraddi Shashidhar Giraddi added a comment - crashed on RHEL 8.8 as well https://testing.whamcloud.com/test_sets/23dfc7d5-d6c9-4eee-be1a-c410fde2165b
            mdiep Minh Diep added a comment - crashed on el8.8 as well https://testing.whamcloud.com/test_sets/d2166d8d-e478-4b3f-8a29-14fb42ba0a90

            Client was running 2.15.51.45 against 2.12.9 servers crashed with following allocation stats, so mostly in inactive_file:
            https://testing.whamcloud.com/test_sets/150d06df-aeae-436a-ba86-fc5082d70706

            [ 8984.082842] Mem-Info:
            [ 8984.083352] active_anon:0 inactive_anon:0 isolated_anon:0
            [ 8984.083352]  active_file:108414 inactive_file:517320 isolated_file:576
            [ 8984.083352]  unevictable:0 dirty:73728 writeback:65408
            [ 8984.083352]  slab_reclaimable:5600 slab_unreclaimable:22668
            [ 8984.088978] Node 0 active_anon:0kB inactive_anon:0kB active_file:448656kB inactive_file:2054532kB unevictable:0kB isolated(anon):0kB isolated(file):2048kB mapped:1036kB dirty:294912kB writeback:261632kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2768kB pagetables:8488kB all_unreclaimable? yes
            
            adilger Andreas Dilger added a comment - Client was running 2.15.51.45 against 2.12.9 servers crashed with following allocation stats, so mostly in inactive_file : https://testing.whamcloud.com/test_sets/150d06df-aeae-436a-ba86-fc5082d70706 [ 8984.082842] Mem-Info: [ 8984.083352] active_anon:0 inactive_anon:0 isolated_anon:0 [ 8984.083352] active_file:108414 inactive_file:517320 isolated_file:576 [ 8984.083352] unevictable:0 dirty:73728 writeback:65408 [ 8984.083352] slab_reclaimable:5600 slab_unreclaimable:22668 [ 8984.088978] Node 0 active_anon:0kB inactive_anon:0kB active_file:448656kB inactive_file:2054532kB unevictable:0kB isolated(anon):0kB isolated(file):2048kB mapped:1036kB dirty:294912kB writeback:261632kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:2768kB pagetables:8488kB all_unreclaimable? yes

            This crash is only happening with EL7.9 kernels (tested only on b2_12 with master in interop mode), and not EL8.5, SLES15, etc. kernels, so it is unlikely to be a prime concern going forward, and could eventually be closed.

            adilger Andreas Dilger added a comment - This crash is only happening with EL7.9 kernels (tested only on b2_12 with master in interop mode), and not EL8.5, SLES15, etc. kernels, so it is unlikely to be a prime concern going forward, and could eventually be closed.

            It looks like the memory is at least accounted for in this client-side OOM. The vast majority of memory usage is in pagecache:

            active_anon:37356kB inactive_anon:37596kB active_file:47132kB inactive_file:2391964kB
            

            and the client is trying to write a 5502664kB file on a client with 3145312KB of memory. That shouldn't cause the client to run out of memory, since it should have llite.*.max_cached_mb=1535 (1/2 of RAM). However, osc.*.max_dirty_mb may be as large as 1/8 of RAM and there are 7 OSTs, so dirty data could consume up to 7/8 of RAM (per client_adjust_max_dirty()) if this is not also limited by max_cached_mb?

            adilger Andreas Dilger added a comment - It looks like the memory is at least accounted for in this client-side OOM. The vast majority of memory usage is in pagecache: active_anon:37356kB inactive_anon:37596kB active_file:47132kB inactive_file:2391964kB and the client is trying to write a 5502664kB file on a client with 3145312KB of memory. That shouldn't cause the client to run out of memory, since it should have llite.*.max_cached_mb=1535 (1/2 of RAM). However, osc.*.max_dirty_mb may be as large as 1/8 of RAM and there are 7 OSTs, so dirty data could consume up to 7/8 of RAM (per client_adjust_max_dirty() ) if this is not also limited by max_cached_mb ?
            sarah Sarah Liu added a comment - Hit similar one https://testing.whamcloud.com/test_sets/390f3fdd-df3e-4938-9dd3-ed327f57076f
            yujian Jian Yu added a comment - More failures on master branch in zfs test sessions: https://testing.whamcloud.com/test_sets/df51b436-412f-11ea-9630-52540065bddc https://testing.whamcloud.com/test_sets/a8e429aa-4130-11ea-9847-52540065bddc https://testing.whamcloud.com/test_sets/8359ad34-415a-11ea-9847-52540065bddc https://testing.whamcloud.com/test_sets/ae637af0-4164-11ea-9b1e-52540065bddc

            Here's another sanity-benchmark test_iozone OOM crash with a different trace; https://testing.whamcloud.com/test_sets/cecc9558-081e-11ea-bbc3-52540065bddc . Possibly a different root cause?

            [24986.500955] Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone ============================================================== 10:05:17 (1573725917)
            [24989.359411] Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1782612kB available, using 3438440kB file size
            [24989.610198] Lustre: DEBUG MARKER: min OST has 1782612kB available, using 3438440kB file size
            [25048.708540] LNetError: 18994:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.9.5.216@tcp added to recovery queue. Health = 900
            [25049.117039] LNetError: 18996:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.5.214@tcp added to recovery queue. Health = 900
            [25049.119238] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae7ad2bc00
            [25049.127483] Lustre: 19001:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1573725975/real 1573725975]  req@ffff95ae001c9f80 x1650168582431232/t0(0) o4->lustre-OST0005-osc-ffff95ae65a62000@10.9.5.216@tcp:6/4 lens 488/448 e 0 to 1 dl 1573726020 ref 2 fl Rpc:eXQr/0/ffffffff rc 0/-1 job:'iozone.500'
            [25049.132801] Lustre: lustre-OST0005-osc-ffff95ae65a62000: Connection to lustre-OST0005 (at 10.9.5.216@tcp) was lost; in progress operations using this service will wait for recovery to complete
            [25049.265069] Lustre: lustre-OST0001-osc-ffff95ae65a62000: Connection restored to 10.9.5.216@tcp (at 10.9.5.216@tcp)
            [25086.049686] Lustre: 19004:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573725972/real 1573725973]  req@ffff95ae2087db00 x1650168582429888/t0(0) o4->lustre-OST0007-osc-ffff95ae65a62000@10.9.5.216@tcp:6/4 lens 488/448 e 0 to 1 dl 1573726017 ref 2 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'iozone.500'
            [25086.054856] Lustre: 19004:0:(client.c:2219:ptlrpc_expire_one_request()) Skipped 7 previous similar messages
            [25086.056499] Lustre: lustre-OST0007-osc-ffff95ae65a62000: Connection to lustre-OST0007 (at 10.9.5.216@tcp) was lost; in progress operations using this service will wait for recovery to complete
            [25086.059313] Lustre: Skipped 5 previous similar messages
            [25086.062664] Lustre: lustre-OST0007-osc-ffff95ae65a62000: Connection restored to 10.9.5.216@tcp (at 10.9.5.216@tcp)
            [25086.064514] Lustre: Skipped 5 previous similar messages
            [25189.704358] LNetError: 18994:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.9.5.216@tcp added to recovery queue. Health = 900
            [25190.148392] LNetError: 18996:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.5.214@tcp added to recovery queue. Health = 900
            [25190.150543] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae20af0c00
            [25190.152380] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c400
            [25190.154242] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c400
            [25190.156036] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c400
            [25190.158312] Lustre: 19004:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1573726114/real 1573726115]  req@ffff95ae4d98cd80 x1650168582704448/t0(0) o4->lustre-OST0000-osc-ffff95ae65a62000@10.9.5.216@tcp:6/4 lens 488/448 e 0 to 1 dl 1573726160 ref 2 fl Rpc:eXQr/0/ffffffff rc 0/-1 job:'iozone.500'
            [25190.163356] Lustre: 19004:0:(client.c:2219:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
            [25190.165017] Lustre: lustre-OST0000-osc-ffff95ae65a62000: Connection to lustre-OST0000 (at 10.9.5.216@tcp) was lost; in progress operations using this service will wait for recovery to complete
            [25190.167837] Lustre: Skipped 1 previous similar message
            [25190.202183] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c400
            [25190.205003] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c800
            [25190.206859] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c800
            [25190.208647] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3d800
            [25190.210440] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c800
            [25190.232707] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3d800
            [25190.234622] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3d800
            [25190.236420] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c800
            [25190.238549] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3d800
            [25190.240339] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e000
            [25190.242117] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e400
            [25190.287268] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e000
            [25190.289123] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e400
            [25190.290920] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e400
            [25190.292837] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e000
            [25190.294970] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e800
            [25190.296757] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e800
            [25190.331303] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e400
            [25190.333147] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e000
            [25190.334915] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e800
            [25190.336707] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3f800
            [25190.338642] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3f800
            [25190.341149] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e800
            [25190.388268] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3f800
            [25190.390387] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3f800
            [25190.484817] Lustre: lustre-OST0000-osc-ffff95ae65a62000: Connection restored to 10.9.5.216@tcp (at 10.9.5.216@tcp)
            [25190.486621] Lustre: Skipped 1 previous similar message
            [25258.668206] tuned invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
            [25258.669787] tuned cpuset=/ mems_allowed=0
            [25258.670483] CPU: 1 PID: 1369 Comm: tuned Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.1.1.el7.x86_64 #1
            [25258.672347] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [25258.673340] Call Trace:
            [25258.673874]  [<ffffffff8f7792c2>] dump_stack+0x19/0x1b
            [25258.674763]  [<ffffffff8f773c64>] dump_header+0x90/0x229
            [25258.675696]  [<ffffffff8f787b6f>] ? notifier_call_chain+0x4f/0x70
            [25258.676775]  [<ffffffff8f0cb498>] ? __blocking_notifier_call_chain+0x58/0x70
            [25258.677999]  [<ffffffff8f1bff2e>] check_panic_on_oom+0x2e/0x60
            [25258.679002]  [<ffffffff8f1c034b>] out_of_memory+0x23b/0x4f0
            [25258.679967]  [<ffffffff8f77477c>] __alloc_pages_slowpath+0x5d6/0x724
            [25258.681053]  [<ffffffff8f1c6b84>] __alloc_pages_nodemask+0x404/0x420
            [25258.682149]  [<ffffffff8f218105>] alloc_pages_vma+0xb5/0x200
            [25258.683122]  [<ffffffff8f205ac5>] __read_swap_cache_async+0x115/0x190
            [25258.684227]  [<ffffffff8f205b66>] read_swap_cache_async+0x26/0x60
            [25258.685285]  [<ffffffff8f205d4b>] swapin_readahead+0x1ab/0x210
            [25258.686306]  [<ffffffff8f386722>] ? radix_tree_lookup_slot+0x22/0x50
            [25258.687391]  [<ffffffff8f1bb4ee>] ? __find_get_page+0x1e/0xa0
            [25258.688406]  [<ffffffff8f1efe36>] handle_pte_fault+0xd66/0xe20
            [25258.689423]  [<ffffffff8f25f4dd>] ? core_sys_select+0x26d/0x340
            [25258.690435]  [<ffffffff8f1f200d>] handle_mm_fault+0x39d/0x9b0
            [25258.691422]  [<ffffffff8f787653>] __do_page_fault+0x213/0x500
            [25258.692425]  [<ffffffff8f25e7f3>] ? poll_select_copy_remaining+0x113/0x180
            [25258.693589]  [<ffffffff8f787a26>] trace_do_page_fault+0x56/0x150
            [25258.694607]  [<ffffffff8f786fa2>] do_async_page_fault+0x22/0xf0
            [25258.695641]  [<ffffffff8f7837a8>] async_page_fault+0x28/0x30
            [25258.696613] Mem-Info:
            [25258.697032] active_anon:17275 inactive_anon:17292 isolated_anon:0
             active_file:9706 inactive_file:333050 isolated_file:32
             unevictable:0 dirty:3641 writeback:42567 unstable:0
             slab_reclaimable:4776 slab_unreclaimable:13653
             mapped:6755 shmem:21419 pagetables:1312 bounce:0
             free:12854 free_pcp:46 free_cma:0
            [25258.702574] Node 0 DMA free:7048kB min:416kB low:520kB high:624kB active_anon:0kB inactive_anon:12kB active_file:204kB inactive_file:6872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:144kB writeback:560kB mapped:172kB shmem:12kB slab_reclaimable:52kB slab_unreclaimable:348kB kernel_stack:32kB pagetables:56kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10917 all_unreclaimable? yes
            [25258.709666] lowmem_reserve[]: 0 1660 1660 1660
            [25258.710606] Node 0 DMA32 free:44120kB min:44636kB low:55792kB high:66952kB active_anon:69100kB inactive_anon:69156kB active_file:38620kB inactive_file:1325244kB unevictable:0kB isolated(anon):0kB isolated(file):256kB present:2080744kB managed:1703312kB mlocked:0kB dirty:14420kB writeback:169708kB mapped:26848kB shmem:85664kB slab_reclaimable:19052kB slab_unreclaimable:54516kB kernel_stack:2752kB pagetables:5192kB unstable:0kB bounce:0kB free_pcp:204kB local_pcp:128kB free_cma:0kB writeback_tmp:0kB pages_scanned:2791068 all_unreclaimable? yes
            [25258.718385] lowmem_reserve[]: 0 0 0 0
            [25258.719197] Node 0 DMA: 12*4kB (UE) 19*8kB (UE) 14*16kB (UEM) 7*32kB (UE) 8*64kB (UEM) 2*128kB (UE) 10*256kB (UEM) 4*512kB (UEM) 1*1024kB (M) 0*2048kB 0*4096kB = 7048kB
            [25258.722455] Node 0 DMA32: 73*4kB (UEM) 1577*8kB (UEM) 955*16kB (UEM) 352*32kB (UEM) 71*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43996kB
            [25258.725528] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
            [25258.727002] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
            [25258.728414] 147236 total pagecache pages
            [25258.729119] 146 pages in swap cache
            [25258.729735] Swap cache stats: add 8528, delete 8382, find 310/572
            [25258.730769] Free swap  = 3638780kB
            [25258.731356] Total swap = 3671036kB
            [25258.731949] 524184 pages RAM
            [25258.732448] 0 pages HighMem/MovableOnly
            [25258.733109] 94379 pages reserved
            [25258.733688] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
            [25258.735073] [  382]     0   382     9866     1342      24       77             0 systemd-journal
            [25258.736554] [  402]     0   402    29160      239      28       78             0 lvmetad
            [25258.737928] [  409]     0   409    11238      452      23       60         -1000 systemd-udevd
            [25258.739369] [  534]     0   534    13882      157      27      110         -1000 auditd
            [25258.740717] [  559]     0   559     5384      267      16       40             0 irqbalance
            [25258.742125] [  560]    81   560    14563      456      34      165          -900 dbus-daemon
            [25258.743538] [  561]     0   561     6595      376      18       66             0 systemd-logind
            [25258.745007] [  563]     0   563   136973     1685      87      502             0 NetworkManager
            [25258.746479] [  564]   999   564   153061     1172      63     2396             0 polkitd
            [25258.747848] [  568]     0   568    48777      333      37      130             0 gssproxy
            [25258.749245] [  575]    32   575    17319      236      38      141             0 rpcbind
            [25258.750613] [  578]    38   578    11824      455      28      152             0 ntpd
            [25258.751965] [  606]     0   606    25724      871      53      508             0 dhclient
            [25258.753363] [  952]     0   952   143550     3953     101      895             0 tuned
            [25258.754701] [  953]     0   953    74587     2165      72      141             0 rsyslogd
            [25258.756069] [  959]     0   959    28230      852      58      242         -1000 sshd
            [25258.757380] [  964]   997   964    56472      344      22      124             0 munged
            [25258.758737] [  966]    29   966    10609      223      26      209             0 rpc.statd
            [25258.760131] [  968]     0   968     6792      195      17       63             0 xinetd
            [25258.761481] [ 1024]     0  1024   172948     1594      81        0             0 automount
            [25258.762871] [ 1046]     0  1046     6477      235      19        0             0 atd
            [25258.764177] [ 1054]     0  1054    31573      396      19        0             0 crond
            [25258.765505] [ 1063]     0  1063    27527      215      12        0             0 agetty
            [25258.766852] [ 1065]     0  1065    27527      212      10        0             0 agetty
            [25258.768190] [ 1290]     0  1290    22425      532      43        0             0 master
            [25258.769539] [ 1310]    89  1310    22468      996      43        0             0 qmgr
            [25258.770874] [ 9706]     0  9706    39196     1381      80        0             0 sshd
            [25258.772191] [ 9708]     0  9708    28300      374      12        0             0 run_test.sh
            [25258.773620] [10014]     0 10014    29609     1225      17        2             0 bash
            [25258.774952] [18916]     0 18916     4844      164      14        1             0 rpc.idmapd
            [25258.776348] [18922]     0 18922    10942      703      27        0             0 rpc.mountd
            [25258.777819] [26115]     0 26115    29609     1011      14        0             0 bash
            [25258.779136] [26116]     0 26116    26991      165       8        0             0 tee
            [25258.780449] [26313]     0 26313    29580     1198      13        0             0 bash
            [25258.781773] [31794]    89 31794    22451      990      48        0             0 pickup
            [25258.783156] [32386]     0 32386    29613     1020      13        0             0 bash
            [25258.784483] [32387]     0 32387    26991      167       9        0             0 tee
            [25258.785821] [  377]   500   377    12351     4815      22      543             0 iozone
            [25258.787194] [  378]     0   378    26991      166       8        0             0 tee
            [25258.788484] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
            
            [25258.790111] CPU: 1 PID: 1369 Comm: tuned Kdump: loaded Tainted: G           OE  ------------   3.10.0-1062.1.1.el7.x86_64 #1
            [25258.791945] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [25258.792900] Call Trace:
            [25258.793333]  [<ffffffff8f7792c2>] dump_stack+0x19/0x1b
            [25258.794195]  [<ffffffff8f772941>] panic+0xe8/0x21f
            [25258.795006]  [<ffffffff8f1bff55>] check_panic_on_oom+0x55/0x60
            [25258.796021]  [<ffffffff8f1c034b>] out_of_memory+0x23b/0x4f0
            [25258.796957]  [<ffffffff8f77477c>] __alloc_pages_slowpath+0x5d6/0x724
            [25258.798015]  [<ffffffff8f1c6b84>] __alloc_pages_nodemask+0x404/0x420
            [25258.799073]  [<ffffffff8f218105>] alloc_pages_vma+0xb5/0x200
            [25258.800020]  [<ffffffff8f205ac5>] __read_swap_cache_async+0x115/0x190
            [25258.801088]  [<ffffffff8f205b66>] read_swap_cache_async+0x26/0x60
            [25258.802102]  [<ffffffff8f205d4b>] swapin_readahead+0x1ab/0x210
            [25258.803079]  [<ffffffff8f386722>] ? radix_tree_lookup_slot+0x22/0x50
            [25258.804137]  [<ffffffff8f1bb4ee>] ? __find_get_page+0x1e/0xa0
            [25258.805101]  [<ffffffff8f1efe36>] handle_pte_fault+0xd66/0xe20
            [25258.806080]  [<ffffffff8f25f4dd>] ? core_sys_select+0x26d/0x340
            [25258.807063]  [<ffffffff8f1f200d>] handle_mm_fault+0x39d/0x9b0
            [25258.808022]  [<ffffffff8f787653>] __do_page_fault+0x213/0x500
            [25258.808979]  [<ffffffff8f25e7f3>] ? poll_select_copy_remaining+0x113/0x180
            [25258.810122]  [<ffffffff8f787a26>] trace_do_page_fault+0x56/0x150
            [25258.811124]  [<ffffffff8f786fa2>] do_async_page_fault+0x22/0xf0
            [25258.812118]  [<ffffffff8f7837a8>] async_page_fault+0x28/0x30
            
            jamesanunez James Nunez (Inactive) added a comment - Here's another sanity-benchmark test_iozone OOM crash with a different trace; https://testing.whamcloud.com/test_sets/cecc9558-081e-11ea-bbc3-52540065bddc . Possibly a different root cause? [24986.500955] Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone ============================================================== 10:05:17 (1573725917) [24989.359411] Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1782612kB available, using 3438440kB file size [24989.610198] Lustre: DEBUG MARKER: min OST has 1782612kB available, using 3438440kB file size [25048.708540] LNetError: 18994:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.9.5.216@tcp added to recovery queue. Health = 900 [25049.117039] LNetError: 18996:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.5.214@tcp added to recovery queue. Health = 900 [25049.119238] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae7ad2bc00 [25049.127483] Lustre: 19001:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1573725975/real 1573725975] req@ffff95ae001c9f80 x1650168582431232/t0(0) o4->lustre-OST0005-osc-ffff95ae65a62000@10.9.5.216@tcp:6/4 lens 488/448 e 0 to 1 dl 1573726020 ref 2 fl Rpc:eXQr/0/ffffffff rc 0/-1 job:'iozone.500' [25049.132801] Lustre: lustre-OST0005-osc-ffff95ae65a62000: Connection to lustre-OST0005 (at 10.9.5.216@tcp) was lost; in progress operations using this service will wait for recovery to complete [25049.265069] Lustre: lustre-OST0001-osc-ffff95ae65a62000: Connection restored to 10.9.5.216@tcp (at 10.9.5.216@tcp) [25086.049686] Lustre: 19004:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1573725972/real 1573725973] req@ffff95ae2087db00 x1650168582429888/t0(0) o4->lustre-OST0007-osc-ffff95ae65a62000@10.9.5.216@tcp:6/4 lens 488/448 e 0 to 1 dl 1573726017 ref 2 fl Rpc:XQr/0/ffffffff rc 0/-1 job:'iozone.500' [25086.054856] Lustre: 19004:0:(client.c:2219:ptlrpc_expire_one_request()) Skipped 7 previous similar messages [25086.056499] Lustre: lustre-OST0007-osc-ffff95ae65a62000: Connection to lustre-OST0007 (at 10.9.5.216@tcp) was lost; in progress operations using this service will wait for recovery to complete [25086.059313] Lustre: Skipped 5 previous similar messages [25086.062664] Lustre: lustre-OST0007-osc-ffff95ae65a62000: Connection restored to 10.9.5.216@tcp (at 10.9.5.216@tcp) [25086.064514] Lustre: Skipped 5 previous similar messages [25189.704358] LNetError: 18994:0:(peer.c:3724:lnet_peer_ni_add_to_recoveryq_locked()) lpni 10.9.5.216@tcp added to recovery queue. Health = 900 [25190.148392] LNetError: 18996:0:(lib-msg.c:481:lnet_handle_local_failure()) ni 10.9.5.214@tcp added to recovery queue. Health = 900 [25190.150543] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae20af0c00 [25190.152380] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c400 [25190.154242] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c400 [25190.156036] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c400 [25190.158312] Lustre: 19004:0:(client.c:2219:ptlrpc_expire_one_request()) @@@ Request sent has failed due to network error: [sent 1573726114/real 1573726115] req@ffff95ae4d98cd80 x1650168582704448/t0(0) o4->lustre-OST0000-osc-ffff95ae65a62000@10.9.5.216@tcp:6/4 lens 488/448 e 0 to 1 dl 1573726160 ref 2 fl Rpc:eXQr/0/ffffffff rc 0/-1 job:'iozone.500' [25190.163356] Lustre: 19004:0:(client.c:2219:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [25190.165017] Lustre: lustre-OST0000-osc-ffff95ae65a62000: Connection to lustre-OST0000 (at 10.9.5.216@tcp) was lost; in progress operations using this service will wait for recovery to complete [25190.167837] Lustre: Skipped 1 previous similar message [25190.202183] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c400 [25190.205003] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c800 [25190.206859] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c800 [25190.208647] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3d800 [25190.210440] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c800 [25190.232707] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3d800 [25190.234622] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3d800 [25190.236420] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3c800 [25190.238549] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3d800 [25190.240339] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e000 [25190.242117] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e400 [25190.287268] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e000 [25190.289123] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e400 [25190.290920] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e400 [25190.292837] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e000 [25190.294970] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e800 [25190.296757] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e800 [25190.331303] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e400 [25190.333147] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e000 [25190.334915] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e800 [25190.336707] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3f800 [25190.338642] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3f800 [25190.341149] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3e800 [25190.388268] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3f800 [25190.390387] LustreError: 18996:0:(events.c:200:client_bulk_callback()) event type 1, status -5, desc ffff95ae66b3f800 [25190.484817] Lustre: lustre-OST0000-osc-ffff95ae65a62000: Connection restored to 10.9.5.216@tcp (at 10.9.5.216@tcp) [25190.486621] Lustre: Skipped 1 previous similar message [25258.668206] tuned invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [25258.669787] tuned cpuset=/ mems_allowed=0 [25258.670483] CPU: 1 PID: 1369 Comm: tuned Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.1.1.el7.x86_64 #1 [25258.672347] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [25258.673340] Call Trace: [25258.673874] [<ffffffff8f7792c2>] dump_stack+0x19/0x1b [25258.674763] [<ffffffff8f773c64>] dump_header+0x90/0x229 [25258.675696] [<ffffffff8f787b6f>] ? notifier_call_chain+0x4f/0x70 [25258.676775] [<ffffffff8f0cb498>] ? __blocking_notifier_call_chain+0x58/0x70 [25258.677999] [<ffffffff8f1bff2e>] check_panic_on_oom+0x2e/0x60 [25258.679002] [<ffffffff8f1c034b>] out_of_memory+0x23b/0x4f0 [25258.679967] [<ffffffff8f77477c>] __alloc_pages_slowpath+0x5d6/0x724 [25258.681053] [<ffffffff8f1c6b84>] __alloc_pages_nodemask+0x404/0x420 [25258.682149] [<ffffffff8f218105>] alloc_pages_vma+0xb5/0x200 [25258.683122] [<ffffffff8f205ac5>] __read_swap_cache_async+0x115/0x190 [25258.684227] [<ffffffff8f205b66>] read_swap_cache_async+0x26/0x60 [25258.685285] [<ffffffff8f205d4b>] swapin_readahead+0x1ab/0x210 [25258.686306] [<ffffffff8f386722>] ? radix_tree_lookup_slot+0x22/0x50 [25258.687391] [<ffffffff8f1bb4ee>] ? __find_get_page+0x1e/0xa0 [25258.688406] [<ffffffff8f1efe36>] handle_pte_fault+0xd66/0xe20 [25258.689423] [<ffffffff8f25f4dd>] ? core_sys_select+0x26d/0x340 [25258.690435] [<ffffffff8f1f200d>] handle_mm_fault+0x39d/0x9b0 [25258.691422] [<ffffffff8f787653>] __do_page_fault+0x213/0x500 [25258.692425] [<ffffffff8f25e7f3>] ? poll_select_copy_remaining+0x113/0x180 [25258.693589] [<ffffffff8f787a26>] trace_do_page_fault+0x56/0x150 [25258.694607] [<ffffffff8f786fa2>] do_async_page_fault+0x22/0xf0 [25258.695641] [<ffffffff8f7837a8>] async_page_fault+0x28/0x30 [25258.696613] Mem-Info: [25258.697032] active_anon:17275 inactive_anon:17292 isolated_anon:0 active_file:9706 inactive_file:333050 isolated_file:32 unevictable:0 dirty:3641 writeback:42567 unstable:0 slab_reclaimable:4776 slab_unreclaimable:13653 mapped:6755 shmem:21419 pagetables:1312 bounce:0 free:12854 free_pcp:46 free_cma:0 [25258.702574] Node 0 DMA free:7048kB min:416kB low:520kB high:624kB active_anon:0kB inactive_anon:12kB active_file:204kB inactive_file:6872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:144kB writeback:560kB mapped:172kB shmem:12kB slab_reclaimable:52kB slab_unreclaimable:348kB kernel_stack:32kB pagetables:56kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:10917 all_unreclaimable? yes [25258.709666] lowmem_reserve[]: 0 1660 1660 1660 [25258.710606] Node 0 DMA32 free:44120kB min:44636kB low:55792kB high:66952kB active_anon:69100kB inactive_anon:69156kB active_file:38620kB inactive_file:1325244kB unevictable:0kB isolated(anon):0kB isolated(file):256kB present:2080744kB managed:1703312kB mlocked:0kB dirty:14420kB writeback:169708kB mapped:26848kB shmem:85664kB slab_reclaimable:19052kB slab_unreclaimable:54516kB kernel_stack:2752kB pagetables:5192kB unstable:0kB bounce:0kB free_pcp:204kB local_pcp:128kB free_cma:0kB writeback_tmp:0kB pages_scanned:2791068 all_unreclaimable? yes [25258.718385] lowmem_reserve[]: 0 0 0 0 [25258.719197] Node 0 DMA: 12*4kB (UE) 19*8kB (UE) 14*16kB (UEM) 7*32kB (UE) 8*64kB (UEM) 2*128kB (UE) 10*256kB (UEM) 4*512kB (UEM) 1*1024kB (M) 0*2048kB 0*4096kB = 7048kB [25258.722455] Node 0 DMA32: 73*4kB (UEM) 1577*8kB (UEM) 955*16kB (UEM) 352*32kB (UEM) 71*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 43996kB [25258.725528] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [25258.727002] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [25258.728414] 147236 total pagecache pages [25258.729119] 146 pages in swap cache [25258.729735] Swap cache stats: add 8528, delete 8382, find 310/572 [25258.730769] Free swap = 3638780kB [25258.731356] Total swap = 3671036kB [25258.731949] 524184 pages RAM [25258.732448] 0 pages HighMem/MovableOnly [25258.733109] 94379 pages reserved [25258.733688] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [25258.735073] [ 382] 0 382 9866 1342 24 77 0 systemd-journal [25258.736554] [ 402] 0 402 29160 239 28 78 0 lvmetad [25258.737928] [ 409] 0 409 11238 452 23 60 -1000 systemd-udevd [25258.739369] [ 534] 0 534 13882 157 27 110 -1000 auditd [25258.740717] [ 559] 0 559 5384 267 16 40 0 irqbalance [25258.742125] [ 560] 81 560 14563 456 34 165 -900 dbus-daemon [25258.743538] [ 561] 0 561 6595 376 18 66 0 systemd-logind [25258.745007] [ 563] 0 563 136973 1685 87 502 0 NetworkManager [25258.746479] [ 564] 999 564 153061 1172 63 2396 0 polkitd [25258.747848] [ 568] 0 568 48777 333 37 130 0 gssproxy [25258.749245] [ 575] 32 575 17319 236 38 141 0 rpcbind [25258.750613] [ 578] 38 578 11824 455 28 152 0 ntpd [25258.751965] [ 606] 0 606 25724 871 53 508 0 dhclient [25258.753363] [ 952] 0 952 143550 3953 101 895 0 tuned [25258.754701] [ 953] 0 953 74587 2165 72 141 0 rsyslogd [25258.756069] [ 959] 0 959 28230 852 58 242 -1000 sshd [25258.757380] [ 964] 997 964 56472 344 22 124 0 munged [25258.758737] [ 966] 29 966 10609 223 26 209 0 rpc.statd [25258.760131] [ 968] 0 968 6792 195 17 63 0 xinetd [25258.761481] [ 1024] 0 1024 172948 1594 81 0 0 automount [25258.762871] [ 1046] 0 1046 6477 235 19 0 0 atd [25258.764177] [ 1054] 0 1054 31573 396 19 0 0 crond [25258.765505] [ 1063] 0 1063 27527 215 12 0 0 agetty [25258.766852] [ 1065] 0 1065 27527 212 10 0 0 agetty [25258.768190] [ 1290] 0 1290 22425 532 43 0 0 master [25258.769539] [ 1310] 89 1310 22468 996 43 0 0 qmgr [25258.770874] [ 9706] 0 9706 39196 1381 80 0 0 sshd [25258.772191] [ 9708] 0 9708 28300 374 12 0 0 run_test.sh [25258.773620] [10014] 0 10014 29609 1225 17 2 0 bash [25258.774952] [18916] 0 18916 4844 164 14 1 0 rpc.idmapd [25258.776348] [18922] 0 18922 10942 703 27 0 0 rpc.mountd [25258.777819] [26115] 0 26115 29609 1011 14 0 0 bash [25258.779136] [26116] 0 26116 26991 165 8 0 0 tee [25258.780449] [26313] 0 26313 29580 1198 13 0 0 bash [25258.781773] [31794] 89 31794 22451 990 48 0 0 pickup [25258.783156] [32386] 0 32386 29613 1020 13 0 0 bash [25258.784483] [32387] 0 32387 26991 167 9 0 0 tee [25258.785821] [ 377] 500 377 12351 4815 22 543 0 iozone [25258.787194] [ 378] 0 378 26991 166 8 0 0 tee [25258.788484] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled [25258.790111] CPU: 1 PID: 1369 Comm: tuned Kdump: loaded Tainted: G OE ------------ 3.10.0-1062.1.1.el7.x86_64 #1 [25258.791945] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [25258.792900] Call Trace: [25258.793333] [<ffffffff8f7792c2>] dump_stack+0x19/0x1b [25258.794195] [<ffffffff8f772941>] panic+0xe8/0x21f [25258.795006] [<ffffffff8f1bff55>] check_panic_on_oom+0x55/0x60 [25258.796021] [<ffffffff8f1c034b>] out_of_memory+0x23b/0x4f0 [25258.796957] [<ffffffff8f77477c>] __alloc_pages_slowpath+0x5d6/0x724 [25258.798015] [<ffffffff8f1c6b84>] __alloc_pages_nodemask+0x404/0x420 [25258.799073] [<ffffffff8f218105>] alloc_pages_vma+0xb5/0x200 [25258.800020] [<ffffffff8f205ac5>] __read_swap_cache_async+0x115/0x190 [25258.801088] [<ffffffff8f205b66>] read_swap_cache_async+0x26/0x60 [25258.802102] [<ffffffff8f205d4b>] swapin_readahead+0x1ab/0x210 [25258.803079] [<ffffffff8f386722>] ? radix_tree_lookup_slot+0x22/0x50 [25258.804137] [<ffffffff8f1bb4ee>] ? __find_get_page+0x1e/0xa0 [25258.805101] [<ffffffff8f1efe36>] handle_pte_fault+0xd66/0xe20 [25258.806080] [<ffffffff8f25f4dd>] ? core_sys_select+0x26d/0x340 [25258.807063] [<ffffffff8f1f200d>] handle_mm_fault+0x39d/0x9b0 [25258.808022] [<ffffffff8f787653>] __do_page_fault+0x213/0x500 [25258.808979] [<ffffffff8f25e7f3>] ? poll_select_copy_remaining+0x113/0x180 [25258.810122] [<ffffffff8f787a26>] trace_do_page_fault+0x56/0x150 [25258.811124] [<ffffffff8f786fa2>] do_async_page_fault+0x22/0xf0 [25258.812118] [<ffffffff8f7837a8>] async_page_fault+0x28/0x30

            People

              bobijam Zhenyu Xu
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: