Details
-
Bug
-
Resolution: Won't Fix
-
Minor
-
None
-
Lustre 2.11.0
-
3
-
9223372036854775807
Description
In sanity-benchmark test_iozone, we see the test hangs and the client get an OOM error
[12124.299903] Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone ============================================================== 01:43:13 (1518658993) [12124.927332] Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1785996kB available, using 3834168kB file size [12125.287897] Lustre: DEBUG MARKER: min OST has 1785996kB available, using 3834168kB file size [12313.629569] qmgr invoked oom-killer: gfp_mask=0x24200ca, order=0, oom_score_adj=0 [12313.630538] qmgr cpuset=/ mems_allowed=0 [12313.631007] CPU: 1 PID: 1120 Comm: qmgr Tainted: G OE 4.4.0-109-generic #132-Ubuntu [12313.631859] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [12313.632424] 0000000000000286 cef8223c230c077a ffff880077e1fa00 ffffffff813fbb03 [12313.633251] ffff880077e1fb70 0000000000000000 ffff880077e1fa70 ffffffff8120d3de [12313.634067] ffffffff81e6b0c0 0000000000000000 ffff880077e1faa0 00000000ffffffff [12313.634893] Call Trace: [12313.635187] [<ffffffff813fbb03>] dump_stack+0x63/0x90 [12313.635726] [<ffffffff8120d3de>] dump_header+0x5a/0x1c5 [12313.636272] [<ffffffff810a2a82>] ? __blocking_notifier_call_chain+0x52/0x60 [12313.636980] [<ffffffff8119416b>] check_panic_on_oom+0x2b/0x50 [12313.637565] [<ffffffff811943fa>] out_of_memory+0x26a/0x460 [12313.638128] [<ffffffff8119a3b5>] __alloc_pages_slowpath.constprop.88+0x965/0xb00 [12313.638870] [<ffffffff8119a7d6>] __alloc_pages_nodemask+0x286/0x2a0 [12313.639499] [<ffffffff811e591d>] alloc_pages_vma+0xad/0x250 [12313.640073] [<ffffffff811d639e>] __read_swap_cache_async+0xee/0x140 [12313.640699] [<ffffffff811d6416>] read_swap_cache_async+0x26/0x60 [12313.641299] [<ffffffff811d6598>] swapin_readahead+0x148/0x1b0 [12313.641882] [<ffffffff8119001e>] ? find_get_entry+0x1e/0xa0 [12313.642442] [<ffffffff811910cd>] ? pagecache_get_page+0x2d/0x1c0 [12313.643060] [<ffffffff810cbb61>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [12313.643872] [<ffffffff811c3617>] handle_mm_fault+0x1317/0x1820 [12313.644462] [<ffffffff8125a90b>] ? ep_poll+0x37b/0x3d0 [12313.645002] [<ffffffff8106b687>] __do_page_fault+0x197/0x400 [12313.645576] [<ffffffff810ad200>] ? wake_up_q+0x70/0x70 [12313.646094] [<ffffffff8106b957>] trace_do_page_fault+0x37/0xe0 [12313.646691] [<ffffffff81063f29>] do_async_page_fault+0x19/0x70 [12313.647287] [<ffffffff81846a48>] async_page_fault+0x28/0x30 [12313.647879] Mem-Info: [12313.648127] active_anon:33 inactive_anon:71 isolated_anon:0 [12313.648127] active_file:219681 inactive_file:219943 isolated_file:181 [12313.648127] unevictable:0 dirty:0 writeback:8 unstable:0 [12313.648127] slab_reclaimable:2966 slab_unreclaimable:6631 [12313.648127] mapped:420 shmem:59 pagetables:1423 bounce:0 [12313.648127] free:13590 free_pcp:0 free_cma:0 [12313.651313] Node 0 DMA free:7652kB min:380kB low:472kB high:568kB active_anon:132kB inactive_anon:236kB active_file:3248kB inactive_file:3580kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:236kB slab_reclaimable:48kB slab_unreclaimable:256kB kernel_stack:16kB pagetables:88kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:60752 all_unreclaimable? yes [12313.655696] lowmem_reserve[]: 0 1820 1820 1820 1820 [12313.656331] Node 0 DMA32 free:46708kB min:44672kB low:55840kB high:67008kB active_anon:0kB inactive_anon:48kB active_file:875476kB inactive_file:876396kB unevictable:0kB isolated(anon):0kB isolated(file):512kB present:2080744kB managed:1901176kB mlocked:0kB dirty:0kB writeback:32kB mapped:1680kB shmem:0kB slab_reclaimable:11816kB slab_unreclaimable:26268kB kernel_stack:2864kB pagetables:5604kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11559536 all_unreclaimable? yes [12313.660847] lowmem_reserve[]: 0 0 0 0 0 [12313.661363] Node 0 DMA: 11*4kB (UME) 11*8kB (UE) 10*16kB (UE) 8*32kB (UE) 7*64kB (UME) 2*128kB (UM) 3*256kB (UME) 3*512kB (UME) 2*1024kB (UE) 1*2048kB (U) 0*4096kB = 7652kB [12313.663365] Node 0 DMA32: 105*4kB (UME) 1414*8kB (UMEH) 891*16kB (UMEH) 399*32kB (UME) 86*64kB (UE) 2*128kB (ME) 1*256kB (M) 2*512kB (M) 1*1024kB (M) 0*2048kB 0*4096kB = 46820kB [12313.665390] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [12313.666263] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [12313.667093] 602 total pagecache pages [12313.667468] 5 pages in swap cache [12313.667808] Swap cache stats: add 25527, delete 25522, find 1181/1916 [12313.668440] Free swap = 2000200kB [12313.668789] Total swap = 2095100kB [12313.669132] 524184 pages RAM [12313.669432] 0 pages HighMem/MovableOnly [12313.669826] 44913 pages reserved [12313.670158] 0 pages cma reserved [12313.670498] 0 pages hwpoisoned [12313.670815] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [12313.671686] [ 199] 0 199 9520 0 22 3 1120 0 systemd-journal [12313.672623] [ 263] 0 263 11190 3 23 3 284 -1000 systemd-udevd [12313.673539] [ 332] 0 332 9940 1 25 3 215 0 rpc.gssd [12313.674426] [ 520] 0 520 6997 1 18 3 66 0 cron [12313.675277] [ 521] 0 521 47301 0 91 3 357 0 sssd [12313.676140] [ 526] 106 526 13786 0 27 3 141 -900 dbus-daemon [12313.677038] [ 539] 104 539 67160 0 29 3 264 0 rsyslogd [12313.677906] [ 549] 0 549 69265 18 123 3 641 0 sssd_be [12313.678778] [ 623] 0 623 45843 0 91 3 318 0 sssd_nss [12313.679654] [ 624] 0 624 40743 0 80 3 313 0 sssd_pam [12313.680532] [ 625] 0 625 39140 0 77 3 326 0 sssd_ssh [12313.681402] [ 626] 0 626 46615 0 89 3 391 0 sssd_pac [12313.682286] [ 640] 0 640 5093 0 14 3 219 0 dhclient [12313.683158] [ 647] 0 647 73862 62 41 4 699 0 accounts-daemon [12313.684095] [ 667] 0 667 7136 1 20 3 80 0 systemd-logind [12313.685016] [ 687] 0 687 69295 1 38 3 181 0 polkitd [12313.685885] [ 721] 0 721 17190 1 35 3 191 0 certmonger [12313.686788] [ 730] 113 730 57679 0 25 4 242 0 munged [12313.687651] [ 741] 0 741 16377 6 37 3 173 -1000 sshd [12313.688481] [ 742] 0 742 17886 3 25 3 124 0 oddjobd [12313.689356] [ 960] 108 960 28029 0 25 3 165 0 ntpd [12313.690213] [ 965] 0 965 184372 0 89 4 1064 0 automount [12313.691098] [ 982] 0 982 3764 1 14 3 48 0 xinetd [12313.691950] [ 987] 0 987 4868 101 16 3 78 0 irqbalance [12313.692851] [ 1007] 0 1007 3736 0 12 3 39 0 agetty [12313.693746] [ 1008] 0 1008 3690 0 12 3 37 0 agetty [12313.694620] [ 1111] 0 1111 16873 0 25 3 123 0 master [12313.695470] [ 1120] 111 1120 19943 0 27 3 119 0 qmgr [12313.696309] [ 1231] 0 1231 28855 168 56 3 284 0 sshd [12313.697150] [ 1239] 0 1239 9200 0 21 3 192 0 systemd [12313.698023] [ 1242] 0 1242 17879 0 36 3 473 0 (sd-pam) [12313.698920] [ 1260] 0 1260 3406 1 11 3 62 0 run_test.sh [12313.699825] [ 1426] 0 1426 4858 1 14 3 1520 0 bash [12313.700684] [30716] 0 30716 4858 0 14 3 1521 0 bash [12313.701532] [30717] 0 30717 1576 0 8 3 26 0 tee [12313.702381] [30835] 0 30835 4869 1 14 3 1532 0 bash [12313.703228] [ 2634] 111 2634 19931 0 26 3 112 0 pickup [12313.704091] [ 2983] 0 2983 4870 0 14 3 1534 0 bash [12313.704936] [ 2984] 0 2984 1576 0 8 3 26 0 tee [12313.705761] [ 3244] 500 3244 11856 40 21 3 5188 0 iozone [12313.706623] [ 3245] 0 3245 1576 0 8 3 26 0 tee [12313.707459] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled [12313.707459]
Is this the same OOM as in LU-10622?
Logs for this failure are at
https://testing.hpdd.intel.com/test_sets/49d99c30-12b9-11e8-a6ad-52540065bddc