Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10687

sanity-benchmark test iozone hangs with client OOM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Minor
    • None
    • Lustre 2.11.0
    • 3
    • 9223372036854775807

    Description

      In sanity-benchmark test_iozone, we see the test hangs and the client get an OOM error

      [12124.299903] Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone ============================================================== 01:43:13 (1518658993)
      [12124.927332] Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1785996kB available, using 3834168kB file size
      [12125.287897] Lustre: DEBUG MARKER: min OST has 1785996kB available, using 3834168kB file size
      [12313.629569] qmgr invoked oom-killer: gfp_mask=0x24200ca, order=0, oom_score_adj=0
      [12313.630538] qmgr cpuset=/ mems_allowed=0
      [12313.631007] CPU: 1 PID: 1120 Comm: qmgr Tainted: G           OE   4.4.0-109-generic #132-Ubuntu
      [12313.631859] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [12313.632424]  0000000000000286 cef8223c230c077a ffff880077e1fa00 ffffffff813fbb03
      [12313.633251]  ffff880077e1fb70 0000000000000000 ffff880077e1fa70 ffffffff8120d3de
      [12313.634067]  ffffffff81e6b0c0 0000000000000000 ffff880077e1faa0 00000000ffffffff
      [12313.634893] Call Trace:
      [12313.635187]  [<ffffffff813fbb03>] dump_stack+0x63/0x90
      [12313.635726]  [<ffffffff8120d3de>] dump_header+0x5a/0x1c5
      [12313.636272]  [<ffffffff810a2a82>] ? __blocking_notifier_call_chain+0x52/0x60
      [12313.636980]  [<ffffffff8119416b>] check_panic_on_oom+0x2b/0x50
      [12313.637565]  [<ffffffff811943fa>] out_of_memory+0x26a/0x460
      [12313.638128]  [<ffffffff8119a3b5>] __alloc_pages_slowpath.constprop.88+0x965/0xb00
      [12313.638870]  [<ffffffff8119a7d6>] __alloc_pages_nodemask+0x286/0x2a0
      [12313.639499]  [<ffffffff811e591d>] alloc_pages_vma+0xad/0x250
      [12313.640073]  [<ffffffff811d639e>] __read_swap_cache_async+0xee/0x140
      [12313.640699]  [<ffffffff811d6416>] read_swap_cache_async+0x26/0x60
      [12313.641299]  [<ffffffff811d6598>] swapin_readahead+0x148/0x1b0
      [12313.641882]  [<ffffffff8119001e>] ? find_get_entry+0x1e/0xa0
      [12313.642442]  [<ffffffff811910cd>] ? pagecache_get_page+0x2d/0x1c0
      [12313.643060]  [<ffffffff810cbb61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
      [12313.643872]  [<ffffffff811c3617>] handle_mm_fault+0x1317/0x1820
      [12313.644462]  [<ffffffff8125a90b>] ? ep_poll+0x37b/0x3d0
      [12313.645002]  [<ffffffff8106b687>] __do_page_fault+0x197/0x400
      [12313.645576]  [<ffffffff810ad200>] ? wake_up_q+0x70/0x70
      [12313.646094]  [<ffffffff8106b957>] trace_do_page_fault+0x37/0xe0
      [12313.646691]  [<ffffffff81063f29>] do_async_page_fault+0x19/0x70
      [12313.647287]  [<ffffffff81846a48>] async_page_fault+0x28/0x30
      [12313.647879] Mem-Info:
      [12313.648127] active_anon:33 inactive_anon:71 isolated_anon:0
      [12313.648127]  active_file:219681 inactive_file:219943 isolated_file:181
      [12313.648127]  unevictable:0 dirty:0 writeback:8 unstable:0
      [12313.648127]  slab_reclaimable:2966 slab_unreclaimable:6631
      [12313.648127]  mapped:420 shmem:59 pagetables:1423 bounce:0
      [12313.648127]  free:13590 free_pcp:0 free_cma:0
      [12313.651313] Node 0 DMA free:7652kB min:380kB low:472kB high:568kB active_anon:132kB inactive_anon:236kB active_file:3248kB inactive_file:3580kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:236kB slab_reclaimable:48kB slab_unreclaimable:256kB kernel_stack:16kB pagetables:88kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:60752 all_unreclaimable? yes
      [12313.655696] lowmem_reserve[]: 0 1820 1820 1820 1820
      [12313.656331] Node 0 DMA32 free:46708kB min:44672kB low:55840kB high:67008kB active_anon:0kB inactive_anon:48kB active_file:875476kB inactive_file:876396kB unevictable:0kB isolated(anon):0kB isolated(file):512kB present:2080744kB managed:1901176kB mlocked:0kB dirty:0kB writeback:32kB mapped:1680kB shmem:0kB slab_reclaimable:11816kB slab_unreclaimable:26268kB kernel_stack:2864kB pagetables:5604kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11559536 all_unreclaimable? yes
      [12313.660847] lowmem_reserve[]: 0 0 0 0 0
      [12313.661363] Node 0 DMA: 11*4kB (UME) 11*8kB (UE) 10*16kB (UE) 8*32kB (UE) 7*64kB (UME) 2*128kB (UM) 3*256kB (UME) 3*512kB (UME) 2*1024kB (UE) 1*2048kB (U) 0*4096kB = 7652kB
      [12313.663365] Node 0 DMA32: 105*4kB (UME) 1414*8kB (UMEH) 891*16kB (UMEH) 399*32kB (UME) 86*64kB (UE) 2*128kB (ME) 1*256kB (M) 2*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 46820kB
      [12313.665390] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [12313.666263] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [12313.667093] 602 total pagecache pages
      [12313.667468] 5 pages in swap cache
      [12313.667808] Swap cache stats: add 25527, delete 25522, find 1181/1916
      [12313.668440] Free swap  = 2000200kB
      [12313.668789] Total swap = 2095100kB
      [12313.669132] 524184 pages RAM
      [12313.669432] 0 pages HighMem/MovableOnly
      [12313.669826] 44913 pages reserved
      [12313.670158] 0 pages cma reserved
      [12313.670498] 0 pages hwpoisoned
      [12313.670815] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
      [12313.671686] [  199]     0   199     9520        0      22       3     1120             0 systemd-journal
      [12313.672623] [  263]     0   263    11190        3      23       3      284         -1000 systemd-udevd
      [12313.673539] [  332]     0   332     9940        1      25       3      215             0 rpc.gssd
      [12313.674426] [  520]     0   520     6997        1      18       3       66             0 cron
      [12313.675277] [  521]     0   521    47301        0      91       3      357             0 sssd
      [12313.676140] [  526]   106   526    13786        0      27       3      141          -900 dbus-daemon
      [12313.677038] [  539]   104   539    67160        0      29       3      264             0 rsyslogd
      [12313.677906] [  549]     0   549    69265       18     123       3      641             0 sssd_be
      [12313.678778] [  623]     0   623    45843        0      91       3      318             0 sssd_nss
      [12313.679654] [  624]     0   624    40743        0      80       3      313             0 sssd_pam
      [12313.680532] [  625]     0   625    39140        0      77       3      326             0 sssd_ssh
      [12313.681402] [  626]     0   626    46615        0      89       3      391             0 sssd_pac
      [12313.682286] [  640]     0   640     5093        0      14       3      219             0 dhclient
      [12313.683158] [  647]     0   647    73862       62      41       4      699             0 accounts-daemon
      [12313.684095] [  667]     0   667     7136        1      20       3       80             0 systemd-logind
      [12313.685016] [  687]     0   687    69295        1      38       3      181             0 polkitd
      [12313.685885] [  721]     0   721    17190        1      35       3      191             0 certmonger
      [12313.686788] [  730]   113   730    57679        0      25       4      242             0 munged
      [12313.687651] [  741]     0   741    16377        6      37       3      173         -1000 sshd
      [12313.688481] [  742]     0   742    17886        3      25       3      124             0 oddjobd
      [12313.689356] [  960]   108   960    28029        0      25       3      165             0 ntpd
      [12313.690213] [  965]     0   965   184372        0      89       4     1064             0 automount
      [12313.691098] [  982]     0   982     3764        1      14       3       48             0 xinetd
      [12313.691950] [  987]     0   987     4868      101      16       3       78             0 irqbalance
      [12313.692851] [ 1007]     0  1007     3736        0      12       3       39             0 agetty
      [12313.693746] [ 1008]     0  1008     3690        0      12       3       37             0 agetty
      [12313.694620] [ 1111]     0  1111    16873        0      25       3      123             0 master
      [12313.695470] [ 1120]   111  1120    19943        0      27       3      119             0 qmgr
      [12313.696309] [ 1231]     0  1231    28855      168      56       3      284             0 sshd
      [12313.697150] [ 1239]     0  1239     9200        0      21       3      192             0 systemd
      [12313.698023] [ 1242]     0  1242    17879        0      36       3      473             0 (sd-pam)
      [12313.698920] [ 1260]     0  1260     3406        1      11       3       62             0 run_test.sh
      [12313.699825] [ 1426]     0  1426     4858        1      14       3     1520             0 bash
      [12313.700684] [30716]     0 30716     4858        0      14       3     1521             0 bash
      [12313.701532] [30717]     0 30717     1576        0       8       3       26             0 tee
      [12313.702381] [30835]     0 30835     4869        1      14       3     1532             0 bash
      [12313.703228] [ 2634]   111  2634    19931        0      26       3      112             0 pickup
      [12313.704091] [ 2983]     0  2983     4870        0      14       3     1534             0 bash
      [12313.704936] [ 2984]     0  2984     1576        0       8       3       26             0 tee
      [12313.705761] [ 3244]   500  3244    11856       40      21       3     5188             0 iozone
      [12313.706623] [ 3245]     0  3245     1576        0       8       3       26             0 tee
      [12313.707459] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
      [12313.707459] 
      

      Is this the same OOM as in LU-10622?

      Logs for this failure are at
      https://testing.hpdd.intel.com/test_sets/49d99c30-12b9-11e8-a6ad-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: