Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.13.0, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.6, Lustre 2.12.8, Lustre 2.16.0
-
None
-
3
-
9223372036854775807
Description
sanity-benchmark test_iozone crashes with OOM. This crash has been seen in ARM and x86_64 client testing a total of eight times. The first occurrence was 30 JULY 2019 for Lustre 2.12.2.101 and 09 AUG 2019 for Lustre 2.12.56.87.
Looking at the kernel crash for https://testing.whamcloud.com/test_sets/93a9c704-eb70-11e9-b62b-52540065bddc, we see ext4_filemap_fault in the call stack which seems unique to other OOM crashes we’ve seen
[23529.881894] Lustre: DEBUG MARKER: == sanity-benchmark test iozone: iozone ============================================================== 22:48:27 (1570574907) [23532.537981] Lustre: DEBUG MARKER: /usr/sbin/lctl mark min OST has 1785584kB available, using 3074176kB file size [23533.130811] Lustre: DEBUG MARKER: min OST has 1785584kB available, using 3074176kB file size [23702.824787] in:imjournal invoked oom-killer: gfp_mask=0x14201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=(null), order=0, oom_score_adj=0 [23702.841803] in:imjournal cpuset=/ mems_allowed=0 [23702.844436] CPU: 0 PID: 937 Comm: in:imjournal Kdump: loaded Tainted: G OE ------------ 4.14.0-115.2.2.el7a.aarch64 #1 [23702.851192] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [23702.855331] Call trace: [23702.856924] [<ffff000008089e14>] dump_backtrace+0x0/0x23c [23702.860189] [<ffff00000808a074>] show_stack+0x24/0x2c [23702.863232] [<ffff000008855c28>] dump_stack+0x84/0xa8 [23702.866282] [<ffff000008211fc0>] dump_header+0x94/0x1ec [23702.869476] [<ffff000008211e4c>] out_of_memory+0x430/0x484 [23702.872747] [<ffff0000082179c4>] __alloc_pages_nodemask+0xa78/0xec0 [23702.876522] [<ffff00000827a89c>] alloc_pages_current+0x8c/0xd8 [23702.880039] [<ffff000008209eb8>] __page_cache_alloc+0x9c/0xd8 [23702.883499] [<ffff00000820dc40>] filemap_fault+0x340/0x550 [23702.887580] [<ffff000001405608>] ext4_filemap_fault+0x38/0x54 [ext4] [23702.891420] [<ffff00000824b364>] __do_fault+0x30/0xf4 [23702.894459] [<ffff000008250130>] do_fault+0x3ec/0x4b8 [23702.897517] [<ffff00000825178c>] __handle_mm_fault+0x3f4/0x560 [23702.900998] [<ffff0000082519d8>] handle_mm_fault+0xe0/0x178 [23702.904324] [<ffff000008872dc4>] do_page_fault+0x1c4/0x3cc [23702.907608] [<ffff00000887301c>] do_translation_fault+0x50/0x5c [23702.911152] [<ffff0000080813e8>] do_mem_abort+0x64/0xe4 [23702.914390] [<ffff000008081568>] do_el0_ia_bp_hardening+0x94/0xb4 [23702.918206] Exception stack(0xffff00000be2fec0 to 0xffff00000be30000) [23702.922205] fec0: 0000000000000000 0000000000000000 0000000000000000 0000ffff9768e6a0 [23702.927072] fee0: 0000000000000002 0000000000000000 00000000ffffffbb 0000000000000000 [23702.931975] ff00: 0000000000000049 003b9aca00000000 0000000000005c93 0000000028da3176 [23702.936883] ff20: 0000000000000018 000000005d9d12e6 001d34ce80000000 0000a26c46000000 [23702.941748] ff40: 0000ffff987ffae0 0000ffff98974ef0 0000000000000012 0000ffff987ff000 [23702.946622] ff60: 00000000000dbba0 0000ffff987ff000 0000ffff900be4d0 0000ffff98830000 [23702.951483] ff80: 000000000000b712 0000ffff900acef0 0000ffff9768e8a0 0000ffff98830000 [23702.956372] ffa0: 0000000000000000 0000ffff9768e700 0000ffff987ca4e0 0000ffff9768e700 [23702.961255] ffc0: 0000ffff987ca4e0 0000000080000000 0000ffff9768e720 00000000ffffffff [23702.966136] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [23702.971019] [<ffff0000080832a4>] el0_ia+0x1c/0x20 [23702.974052] Mem-Info: [23702.975375] active_anon:0 inactive_anon:0 isolated_anon:0 active_file:3693 inactive_file:15860 isolated_file:64 unevictable:0 dirty:256 writeback:3416 unstable:0 slab_reclaimable:351 slab_unreclaimable:1716 mapped:4 shmem:0 pagetables:145 bounce:0 free:1170 free_pcp:4 free_cma:0 [23702.994215] Node 0 active_anon:0kB inactive_anon:0kB active_file:236352kB inactive_file:1014336kB unevictable:0kB isolated(anon):0kB isolated(file):4096kB mapped:256kB dirty:16384kB writeback:218624kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes [23703.010485] Node 0 DMA free:74880kB min:75328kB low:94144kB high:112960kB active_anon:0kB inactive_anon:0kB active_file:236352kB inactive_file:1012992kB unevictable:0kB writepending:235008kB present:2097152kB managed:1537088kB mlocked:0kB kernel_stack:10624kB pagetables:9280kB bounce:0kB free_pcp:256kB local_pcp:128kB free_cma:0kB [23703.028015] lowmem_reserve[]: 0 0 0 [23703.030000] Node 0 DMA: 92*64kB (U) 5*128kB (U) 1*256kB (U) 3*512kB (U) 1*1024kB (U) 0*2048kB 0*4096kB 2*8192kB (U) 1*16384kB (U) 1*32768kB (U) 0*65536kB 0*131072kB 0*262144kB 0*524288kB = 74880kB [23703.040433] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [23703.045443] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB [23703.050373] 3700 total pagecache pages [23703.052649] 0 pages in swap cache [23703.054534] Swap cache stats: add 4640, delete 4640, find 319/523 [23703.057997] Free swap = 1826560kB [23703.059928] Total swap = 2098112kB [23703.061952] 32768 pages RAM [23703.063537] 0 pages HighMem/MovableOnly [23703.065741] 8751 pages reserved [23703.067542] 0 pages hwpoisoned [23703.069277] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [23703.074369] [ 417] 0 417 237 0 3 2 39 0 systemd-journal [23703.079784] [ 439] 0 439 1282 0 4 2 43 0 lvmetad [23703.085004] [ 453] 0 453 243 2 4 2 41 -1000 systemd-udevd [23703.090284] [ 541] 0 541 267 0 4 2 48 -1000 auditd [23703.095514] [ 588] 81 588 160 2 3 2 59 -900 dbus-daemon [23703.100750] [ 589] 32 589 185 0 4 2 75 0 rpcbind [23703.105767] [ 592] 0 592 2458 0 4 2 55 0 gssproxy [23703.110859] [ 600] 0 600 6767 0 5 2 172 0 NetworkManager [23703.116188] [ 601] 999 601 8492 0 6 3 145 0 polkitd [23703.121264] [ 602] 0 602 88 0 3 2 31 0 systemd-logind [23703.126603] [ 603] 0 603 92 2 3 2 24 0 irqbalance [23703.131843] [ 610] 38 610 160 2 4 2 47 0 ntpd [23703.136684] [ 683] 0 683 359 2 3 2 113 0 dhclient [23703.141833] [ 917] 0 917 323 2 4 2 104 -1000 sshd [23703.146688] [ 921] 0 921 6888 1 5 2 318 0 tuned [23703.151673] [ 923] 0 923 91 2 3 2 29 0 xinetd [23703.156625] [ 924] 0 924 3762 1 4 2 99 0 rsyslogd [23703.161769] [ 930] 997 930 3200 0 3 2 51 0 munged [23703.166723] [ 938] 29 938 130 2 3 2 51 0 rpc.statd [23703.171907] [ 980] 0 980 9711 0 5 2 156 0 automount [23703.177064] [ 986] 0 986 1756 0 4 2 39 0 crond [23703.182076] [ 988] 0 988 78 0 4 2 29 0 atd [23703.186875] [ 1003] 0 1003 1718 2 3 2 10 0 agetty [23703.191913] [ 1005] 0 1005 1718 2 3 2 10 0 agetty [23703.196881] [ 1521] 0 1521 344 0 4 2 85 0 master [23703.201935] [ 1565] 89 1565 347 2 4 2 81 0 qmgr [23703.206802] [ 9394] 0 9394 392 0 4 2 149 0 sshd [23703.211760] [ 9396] 0 9396 1739 0 3 2 14 0 run_test.sh [23703.216936] [ 9702] 0 9702 1788 2 3 2 63 0 bash [23703.221846] [22149] 89 22149 346 0 4 2 81 0 pickup [23703.226825] [26394] 0 26394 1788 1 3 2 63 0 bash [23703.231749] [26395] 0 26395 1715 1 4 2 8 0 tee [23703.236525] [26592] 0 26592 1785 2 3 2 61 0 bash [23703.241464] [31996] 0 31996 1788 1 3 2 63 0 bash [23703.246327] [31997] 0 31997 1715 1 4 2 9 0 tee [23703.251181] [32459] 500 32459 685 0 4 2 284 0 iozone [23703.256109] [32460] 0 32460 1715 1 4 2 8 0 tee [23703.260934] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled [23703.266467] CPU: 0 PID: 937 Comm: in:imjournal Kdump: loaded Tainted: G OE ------------ 4.14.0-115.2.2.el7a.aarch64 #1 [23703.273347] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [23703.277474] Call trace: [23703.278946] [<ffff000008089e14>] dump_backtrace+0x0/0x23c [23703.282220] [<ffff00000808a074>] show_stack+0x24/0x2c [23703.285224] [<ffff000008855c28>] dump_stack+0x84/0xa8 [23703.288434] [<ffff0000080d4e5c>] panic+0x138/0x2a0 [23703.291316] [<ffff000008211e70>] out_of_memory+0x454/0x484 [23703.294619] [<ffff0000082179c4>] __alloc_pages_nodemask+0xa78/0xec0 [23703.298371] [<ffff00000827a89c>] alloc_pages_current+0x8c/0xd8 [23703.301874] [<ffff000008209eb8>] __page_cache_alloc+0x9c/0xd8 [23703.305324] [<ffff00000820dc40>] filemap_fault+0x340/0x550 [23703.308897] [<ffff000001405608>] ext4_filemap_fault+0x38/0x54 [ext4] [23703.312710] [<ffff00000824b364>] __do_fault+0x30/0xf4 [23703.315715] [<ffff000008250130>] do_fault+0x3ec/0x4b8 [23703.318783] [<ffff00000825178c>] __handle_mm_fault+0x3f4/0x560 [23703.322271] [<ffff0000082519d8>] handle_mm_fault+0xe0/0x178 [23703.325625] [<ffff000008872dc4>] do_page_fault+0x1c4/0x3cc [23703.328906] [<ffff00000887301c>] do_translation_fault+0x50/0x5c [23703.332421] [<ffff0000080813e8>] do_mem_abort+0x64/0xe4 [23703.335530] [<ffff000008081568>] do_el0_ia_bp_hardening+0x94/0xb4 [23703.339191] Exception stack(0xffff00000be2fec0 to 0xffff00000be30000) [23703.343081] fec0: 0000000000000000 0000000000000000 0000000000000000 0000ffff9768e6a0 [23703.347762] fee0: 0000000000000002 0000000000000000 00000000ffffffbb 0000000000000000 [23703.352444] ff00: 0000000000000049 003b9aca00000000 0000000000005c93 0000000028da3176 [23703.357175] ff20: 0000000000000018 000000005d9d12e6 001d34ce80000000 0000a26c46000000 [23703.361853] ff40: 0000ffff987ffae0 0000ffff98974ef0 0000000000000012 0000ffff987ff000 [23703.366551] ff60: 00000000000dbba0 0000ffff987ff000 0000ffff900be4d0 0000ffff98830000 [23703.371220] ff80: 000000000000b712 0000ffff900acef0 0000ffff9768e8a0 0000ffff98830000 [23703.375942] ffa0: 0000000000000000 0000ffff9768e700 0000ffff987ca4e0 0000ffff9768e700 [23703.380639] ffc0: 0000ffff987ca4e0 0000000080000000 0000ffff9768e720 00000000ffffffff [23703.385333] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [23703.390039] [<ffff0000080832a4>] el0_ia+0x1c/0x20 [23703.392919] SMP: stopping secondary CPUs [23703.398529] Starting crashdump kernel... [23703.400804] Bye!
Logs for other crashes are at
https://testing.whamcloud.com/test_sets/a2546caa-d315-11e9-9fc9-52540065bddc
https://testing.whamcloud.com/test_sets/6bc5196c-bb4d-11e9-a25b-52540065bddc
https://testing.whamcloud.com/test_sets/d4d9a03c-c046-11e9-97d5-52540065bddc
https://testing.whamcloud.com/test_sets/eecba4da-e577-11e9-a197-52540065bddc
Attachments
Issue Links
- is related to
-
LU-12241 recovery-random-scale test fail_client_mds fails with ‘ptlrpcd_00_00: page allocation stalls’
- Open
-
LU-12830 RHEL8.3 and ZFS: oom on OSS
- Resolved
- is related to
-
LU-11189 OSC flow control
- Open
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...