[LU-11410] recovery-mds-scale test failover_mds crashes with ‘ntpd invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE)’ Created: 20/Sep/18 Updated: 26/Jan/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0, Lustre 2.10.6, Lustre 2.12.1, Lustre 2.12.3, Lustre 2.12.4, Lustre 2.12.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | sles12, suse | ||
| Environment: |
SLES12 SP3 clients |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
A client crashes in recovery-mds-scale test_failover_mds. Looking at https://testing.whamcloud.com/test_sets/3e369a98-b8bf-11e8-a7de-52540065bddc, in the kernel crash log, we see [ 766.998879] Lustre: DEBUG MARKER: mds1 has failed over 1 times, and counting...
[ 782.297602] Lustre: Evicted from MGS (at MGC10.9.6.25@tcp_1) after server handle changed from 0x66e2519c6be9cc2 to 0x89680f107ea4b814
[ 782.299262] Lustre: MGC10.9.6.25@tcp: Connection restored to MGC10.9.6.25@tcp_1 (at 10.9.6.26@tcp)
[ 782.362888] LustreError: 13367:0:(client.c:3000:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff88006690c940 x1611617063142688/t4294967305(4294967305) o101->lustre-MDT0000-mdc-ffff88007bb5e800@10.9.6.26@tcp:12/10 lens 952/560 e 0 to 0 dl 1536957913 ref 2 fl Interpret:RP/4/0 rc 301/301
[ 845.630602] ntpd invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=0, order=0, oom_score_adj=0
[ 845.630613] ntpd cpuset=/ mems_allowed=0
[ 845.630628] CPU: 1 PID: 1461 Comm: ntpd Tainted: G OE N 4.4.143-94.47-default #1
[ 845.630629] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 845.630634] 0000000000000000 ffffffff8132ad80 ffff88007be939a0 0000000000000000
[ 845.630637] ffffffff8120935e 0000000000000000 0000000000000000 0000000000000000
[ 845.630639] 0000000000000000 ffffffff810a0927 ffffffff81e9aa20 0000000000000000
[ 845.630639] Call Trace:
[ 845.630702] [<ffffffff81019ac9>] dump_trace+0x59/0x340
[ 845.630711] [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
[ 845.630714] [<ffffffff8101ac71>] show_stack+0x21/0x40
[ 845.630727] [<ffffffff8132ad80>] dump_stack+0x5c/0x7c
[ 845.630748] [<ffffffff8120935e>] dump_header+0x82/0x215
[ 845.630762] [<ffffffff81198079>] check_panic_on_oom+0x29/0x50
[ 845.630770] [<ffffffff8119821a>] out_of_memory+0x17a/0x4a0
[ 845.630777] [<ffffffff8119cc48>] __alloc_pages_nodemask+0xaf8/0xb70
[ 845.630786] [<ffffffff811e6cc4>] alloc_pages_vma+0xa4/0x220
[ 845.630799] [<ffffffff811d70f0>] __read_swap_cache_async+0xf0/0x150
[ 845.630805] [<ffffffff811d7164>] read_swap_cache_async+0x14/0x30
[ 845.630808] [<ffffffff811d727d>] swapin_readahead+0xfd/0x190
[ 845.630814] [<ffffffff811c3771>] handle_pte_fault+0x12b1/0x1670
[ 845.630820] [<ffffffff811c56aa>] handle_mm_fault+0x2fa/0x640
[ 845.630828] [<ffffffff81067d7a>] __do_page_fault+0x23a/0x4b0
[ 845.630838] [<ffffffff8106809c>] trace_do_page_fault+0x3c/0x120
[ 845.630850] [<ffffffff8161da62>] async_page_fault+0x32/0x60
[ 845.633602] DWARF2 unwinder stuck at async_page_fault+0x32/0x60
[ 845.633602]
[ 845.633603] Leftover inexact backtrace:
[ 845.633621] [<ffffffff81338d61>] ? __clear_user+0x21/0x50
[ 845.633624] [<ffffffff810230f2>] ? copy_fpstate_to_sigframe+0x112/0x1a0
[ 845.633625] [<ffffffff810176d1>] ? do_signal+0x511/0x5b0
[ 845.633627] [<ffffffff81067d9a>] ? __do_page_fault+0x25a/0x4b0
[ 845.633634] [<ffffffff8107bf4e>] ? exit_to_usermode_loop+0x70/0xc2
[ 845.633638] [<ffffffff81003ae5>] ? syscall_return_slowpath+0x85/0xa0
[ 845.633644] [<ffffffff8161aa3a>] ? int_ret_from_sys_call+0x25/0xa3
[ 845.633661] Mem-Info:
[ 845.633668] active_anon:25 inactive_anon:41 isolated_anon:0
active_file:69559 inactive_file:371186 isolated_file:0
unevictable:20 dirty:67 writeback:850 unstable:0
slab_reclaimable:2733 slab_unreclaimable:8762
mapped:7090 shmem:26 pagetables:966 bounce:0
free:13103 free_pcp:0 free_cma:0
[ 845.633676] Node 0 DMA free:7736kB min:376kB low:468kB high:560kB active_anon:100kB inactive_anon:104kB active_file:744kB inactive_file:6464kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15904kB mlocked:0kB dirty:0kB writeback:96kB mapped:256kB shmem:104kB slab_reclaimable:20kB slab_unreclaimable:284kB kernel_stack:32kB pagetables:12kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:46916 all_unreclaimable? yes
[ 845.633678] lowmem_reserve[]: 0 1843 1843 1843 1843
[ 845.633683] Node 0 DMA32 free:44676kB min:44676kB low:55844kB high:67012kB active_anon:0kB inactive_anon:60kB active_file:277492kB inactive_file:1478280kB unevictable:80kB isolated(anon):0kB isolated(file):0kB present:2080744kB managed:1900772kB mlocked:80kB dirty:268kB writeback:3304kB mapped:28104kB shmem:0kB slab_reclaimable:10912kB slab_unreclaimable:34764kB kernel_stack:2608kB pagetables:3852kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:11784060 all_unreclaimable? yes
[ 845.633686] lowmem_reserve[]: 0 0 0 0 0
[ 845.633694] Node 0 DMA: 8*4kB (UME) 5*8kB (ME) 3*16kB (UE) 2*32kB (UE) 2*64kB (U) 2*128kB (UE) 2*256kB (ME) 3*512kB (UME) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 7736kB
[ 845.633700] Node 0 DMA32: 916*4kB (UME) 559*8kB (UME) 471*16kB (UME) 268*32kB (UME) 162*64kB (UE) 65*128kB (UM) 7*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44728kB
[ 845.633713] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 845.633720] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 845.633721] 13843 total pagecache pages
[ 845.633722] 0 pages in swap cache
[ 845.633722] Swap cache stats: add 10178, delete 10178, find 72/95
[ 845.633723] Free swap = 14297524kB
[ 845.633725] Total swap = 14338044kB
[ 845.633726] 524184 pages RAM
[ 845.633726] 0 pages HighMem/MovableOnly
[ 845.633726] 45015 pages reserved
[ 845.633727] 0 pages hwpoisoned
[ 845.633727] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[ 845.633824] [ 361] 0 361 10933 686 24 3 1623 0 systemd-journal
[ 845.633827] [ 400] 495 400 13124 940 29 3 116 0 rpcbind
[ 845.633836] [ 404] 0 404 9268 710 21 3 218 -1000 systemd-udevd
[ 845.633839] [ 479] 0 479 4814 612 14 3 58 0 irqbalance
[ 845.633847] [ 484] 499 484 13452 876 26 3 146 -900 dbus-daemon
[ 845.633849] [ 528] 0 528 25126 1025 50 3 207 0 sssd
[ 845.633851] [ 533] 0 533 32270 1881 64 3 288 0 sssd_be
[ 845.633857] [ 538] 0 538 7447 1060 19 3 260 0 wickedd-dhcp6
[ 845.633860] [ 548] 0 548 25845 1531 54 4 180 0 sssd_nss
[ 845.633862] [ 549] 0 549 20713 1198 45 3 175 0 sssd_pam
[ 845.633863] [ 550] 0 550 19112 1161 43 3 166 0 sssd_ssh
[ 845.633870] [ 552] 0 552 7448 1028 18 3 256 0 wickedd-auto4
[ 845.633876] [ 553] 0 553 7448 1085 20 3 265 0 wickedd-dhcp4
[ 845.633880] [ 556] 0 556 84318 905 38 3 269 0 rsyslogd
[ 845.633927] [ 761] 0 761 7480 1056 20 3 287 0 wickedd
[ 845.633934] [ 764] 0 764 7455 1032 18 3 276 0 wickedd-nanny
[ 845.633939] [ 1418] 0 1418 2141 422 10 3 40 0 xinetd
[ 845.633944] [ 1461] 74 1461 8408 974 17 3 164 0 ntpd
[ 845.633953] [ 1473] 74 1473 9461 590 18 3 153 0 ntpd
[ 845.633957] [ 1477] 0 1477 16586 1539 35 3 180 -1000 sshd
[ 845.633964] [ 1492] 493 1492 55352 616 20 3 231 0 munged
[ 845.633972] [ 1517] 0 1517 1664 438 8 3 30 0 agetty
[ 845.633977] [ 1518] 0 1518 1664 419 9 3 29 0 agetty
[ 845.633981] [ 1534] 0 1534 147220 1574 59 3 347 0 automount
[ 845.633986] [ 1570] 0 1570 5513 629 16 3 64 0 systemd-logind
[ 845.633988] [ 1809] 0 1809 8861 822 20 3 109 0 master
[ 845.633990] [ 1820] 51 1820 12439 1046 25 3 108 0 pickup
[ 845.633992] [ 1823] 51 1823 12536 1354 26 3 174 0 qmgr
[ 845.633994] [ 1864] 0 1864 5197 532 18 3 150 0 cron
[ 845.634043] [15714] 0 15714 17465 669 35 3 174 0 in.mrshd
[ 845.634047] [15715] 0 15715 2894 572 10 3 77 0 bash
[ 845.634051] [15720] 0 15720 2894 427 10 3 78 0 bash
[ 845.634053] [15721] 0 15721 3034 585 12 3 219 0 run_dd.sh
[ 845.634057] [16387] 51 16387 12675 1323 25 3 335 0 trivial-rewrite
[ 845.634059] [16388] 51 16388 16918 1732 35 3 244 0 smtp
[ 845.634064] [16434] 0 16434 1062 182 8 3 26 0 dd
[ 845.634068] [16437] 51 16437 12447 1022 24 3 109 0 bounce
[ 845.634074] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
[ 845.634075] CPU: 1 PID: 1461 Comm: ntpd Tainted: G OE N 4.4.143-94.47-default #1
[ 845.634076] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 845.634078] 0000000000000000 ffffffff8132ad80 ffffffff81a28298 ffff88007be938c8
[ 845.634079] ffffffff81191f31 0000000000000010 ffff88007be938d8 ffff88007be93878
[ 845.634081] 000000000000309f ffffffff81a2c56b 0000000000000000 0000000000000000
[ 845.634081] Call Trace:
[ 845.634087] [<ffffffff81019ac9>] dump_trace+0x59/0x340
[ 845.634090] [<ffffffff81019e9a>] show_stack_log_lvl+0xea/0x170
[ 845.634092] [<ffffffff8101ac71>] show_stack+0x21/0x40
[ 845.634095] [<ffffffff8132ad80>] dump_stack+0x5c/0x7c
[ 845.634101] [<ffffffff81191f31>] panic+0xd2/0x232
[ 845.634104] [<ffffffff811980a0>] check_panic_on_oom+0x50/0x50
[ 845.634106] [<ffffffff8119821a>] out_of_memory+0x17a/0x4a0
[ 845.634112] [<ffffffff8119cc48>] __alloc_pages_nodemask+0xaf8/0xb70
[ 845.634116] [<ffffffff811e6cc4>] alloc_pages_vma+0xa4/0x220
[ 845.634119] [<ffffffff811d70f0>] __read_swap_cache_async+0xf0/0x150
[ 845.634123] [<ffffffff811d7164>] read_swap_cache_async+0x14/0x30
[ 845.634125] [<ffffffff811d727d>] swapin_readahead+0xfd/0x190
[ 845.634128] [<ffffffff811c3771>] handle_pte_fault+0x12b1/0x1670
[ 845.634132] [<ffffffff811c56aa>] handle_mm_fault+0x2fa/0x640
[ 845.634135] [<ffffffff81067d7a>] __do_page_fault+0x23a/0x4b0
[ 845.634139] [<ffffffff8106809c>] trace_do_page_fault+0x3c/0x120
[ 845.634141] [<ffffffff8161da62>] async_page_fault+0x32/0x60
[ 845.636233] DWARF2 unwinder stuck at async_page_fault+0x32/0x60
[ 845.636233]
[ 845.636234] Leftover inexact backtrace:
[ 845.636236] [<ffffffff81338d61>] ? __clear_user+0x21/0x50
[ 845.636238] [<ffffffff810230f2>] ? copy_fpstate_to_sigframe+0x112/0x1a0
[ 845.636239] [<ffffffff810176d1>] ? do_signal+0x511/0x5b0
[ 845.636241] [<ffffffff81067d9a>] ? __do_page_fault+0x25a/0x4b0
[ 845.636243] [<ffffffff8107bf4e>] ? exit_to_usermode_loop+0x70/0xc2
[ 845.636246] [<ffffffff81003ae5>] ? syscall_return_slowpath+0x85/0xa0
[ 845.636248] [<ffffffff8161aa3a>] ? int_ret_from_sys_call+0x25/0xa3
In the client (vm3) console log, we see where the vmcore is located [ 782.297602] Lustre: Evicted from MGS (at MGC10.9.6.25@tcp_1) after server handle changed from 0x66e2519c6be9cc2 to 0x89680f107ea4b814 [ 782.299262] Lustre: MGC10.9.6.25@tcp: Connection restored to MGC10.9.6.25@tcp_1 (at 10.9.6.26@tcp) [ 782.362888] LustreError: 13367:0:(client.c:3000:ptlrpc_replay_interpret()) @@@ status 301, old was 0 req@ffff88006690c940 x1611617063142688/t4294967305(4294967305) o101->lustre-MDT0000-mdc-ffff88007bb5e800@10.9.6.26@tcp:12/10 lens 952/560 e 0 to 0 dl 1536957913 ref 2 fl Interpret:RP/4/0 rc 301/301 [ 845.630602] ntpd invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=0, order=0, oom_score_adj=0 [ 845.630613] ntpd cpuset=/ mems_allowed=0 [ 845.630628] CPU: 1 PID: 1461 Comm: ntpd Tainted: G OE N 4.4.143-94.47-default #1 [ [ 2.826867] RPC: Registered named UNIX socket transport module. [ 2.826869] RPC: Registered udp transport module. … The dumpfile is saved to /mnt/trevis-2.trevis.whamcloud.com/export/scratch/dumps/trevis-45vm3.trevis.whamcloud.com/10.9.6.21-2018-09-14-13:48/vmcore. makedumpfile Completed. ------------------------------------------------------------------------------- All failures are seen on SLES12 SP3 server/client and SLES12 SP3 client/CentOS 7 server testing. We’ve seen this crash a few times in the past |
| Comments |
| Comment by James Nunez (Inactive) [ 29/Apr/19 ] |
|
Another oom for recovery-mds-scale in test failover_mds at https://testing.whamcloud.com/test_sets/b81d5294-6692-11e9-8bb1-52540065bddc . From the console log of the client (vm3) running dd [ 2025.771743] Lustre: DEBUG MARKER: mds1 has failed over 2 times, and counting... [ 2025.911466] Lustre: lustre-MDT0000-mdc-ffff8a12b785f800: Connection restored to 10.9.5.124@tcp (at 10.9.5.124@tcp) [ 2119.724896] irqbalance invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 [ 2119.727062] irqbalance cpuset=/ mems_allowed=0 [ 2119.727923] CPU: 1 PID: 465 Comm: irqbalance Tainted: G OE 4.12.14-95.13-default #1 SLE12-SP4 [ 2119.729670] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 2119.730728] Call Trace: [ 2119.731265] dump_stack+0x5a/0x75 [ 2119.731928] dump_header+0x9c/0x238 [ 2119.732628] ? notifier_call_chain+0x47/0x70 [ 2119.733455] ? __blocking_notifier_call_chain+0x51/0x60 [ 2119.734432] out_of_memory+0x44b/0x490 [ 2119.735165] __alloc_pages_slowpath+0x7e5/0xa0d [ 2119.736017] __alloc_pages_nodemask+0x1e9/0x210 [ 2119.736881] alloc_pages_vma+0x92/0x200 [ 2119.737633] __read_swap_cache_async+0x140/0x210 [ 2119.738515] read_swap_cache_async+0x14/0x30 [ 2119.739335] swapin_readahead+0x107/0x1f0 [ 2119.740111] do_swap_page+0x2b8/0x8b0 [ 2119.740830] ? __switch_to_asm+0x34/0x70 [ 2119.741595] ? __switch_to_asm+0x40/0x70 [ 2119.742364] ? __switch_to+0x10c/0x4a0 [ 2119.743099] __handle_mm_fault+0x783/0xef0 [ 2119.743882] handle_mm_fault+0xc4/0x1d0 [ 2119.744639] __do_page_fault+0x1f3/0x4c0 [ 2119.745401] trace_do_page_fault+0x40/0x120 [ 2119.746204] ? async_page_fault+0x2f/0x50 [ 2119.746971] async_page_fault+0x45/0x50 [ 2119.747721] RIP: 0002:0x55859b2a4ba0 [ 2119.748428] RSP: 000a:000055859b2a4b8c EFLAGS: 7ffd17f0ec30 [ 2119.748449] Mem-Info: [ 2119.750018] active_anon:0 inactive_anon:0 isolated_anon:0 [ 2119.750018] active_file:279857 inactive_file:156646 isolated_file:192 [ 2119.750018] unevictable:20 dirty:6162 writeback:0 unstable:0 [ 2119.750018] slab_reclaimable:3259 slab_unreclaimable:9056 [ 2119.750018] mapped:2383 shmem:0 pagetables:940 bounce:0 [ 2119.750018] free:13061 free_pcp:15 free_cma:0 [ 2119.755522] Node 0 active_anon:0kB inactive_anon:0kB active_file:1119428kB inactive_file:626584kB unevictable:80kB isolated(anon):0kB isolated(file):768kB mapped:9532kB dirty:24648kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes [ 2119.760194] Node 0 DMA free:7640kB min:380kB low:472kB high:564kB active_anon:0kB inactive_anon:0kB active_file:8244kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:24kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 2119.765249] lowmem_reserve[]: 0 1822 1822 1822 1822 [ 2119.766189] Node 0 DMA32 free:44604kB min:44672kB low:55840kB high:67008kB active_anon:0kB inactive_anon:0kB active_file:1111184kB inactive_file:626500kB unevictable:80kB writepending:24648kB present:2080744kB managed:1885860kB mlocked:80kB slab_reclaimable:13036kB slab_unreclaimable:36200kB kernel_stack:2176kB pagetables:3760kB bounce:0kB free_pcp:60kB local_pcp:60kB free_cma:0kB [ 2119.771819] lowmem_reserve[]: 0 0 0 0 0 [ 2119.772579] Node 0 DMA: 6*4kB (UM) 6*8kB (UM) 3*16kB (U) 5*32kB (U) 5*64kB (UM) 3*128kB (UM) 2*256kB (UM) 2*512kB (UM) 1*1024kB (M) 0*2048kB 1*4096kB (E) = 7640kB [ 2119.775106] Node 0 DMA32: 1131*4kB (UME) 608*8kB (UME) 527*16kB (UME) 341*32kB (UME) 162*64kB (UM) 43*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44604kB [ 2119.777668] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 2119.779217] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 2119.780702] 11034 total pagecache pages [ 2119.781467] 0 pages in swap cache [ 2119.782135] Swap cache stats: add 15894, delete 15894, find 3269/5072 [ 2119.783293] Free swap = 14295036kB [ 2119.783980] Total swap = 14338044kB [ 2119.784672] 524184 pages RAM [ 2119.785271] 0 pages HighMem/MovableOnly [ 2119.786018] 48742 pages reserved [ 2119.786677] 0 pages hwpoisoned [ 2119.787298] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name [ 2119.788805] [ 359] 0 359 10934 518 24 3 1130 0 systemd-journal [ 2119.790480] [ 373] 0 373 3008 1 12 3 1452 0 haveged [ 2119.792100] [ 380] 0 380 10408 451 23 3 239 -1000 systemd-udevd [ 2119.793748] [ 381] 495 381 13124 1 30 3 126 0 rpcbind [ 2119.795354] [ 441] 499 441 10913 0 25 3 153 -900 dbus-daemon [ 2119.796967] [ 461] 0 461 7469 5 19 3 269 0 wickedd-dhcp6 [ 2119.798612] [ 462] 0 462 28597 61 59 3 247 0 sssd [ 2119.800144] [ 465] 0 465 4814 211 14 3 58 0 irqbalance [ 2119.801754] [ 466] 0 466 7470 4 20 3 272 0 wickedd-dhcp4 [ 2119.803406] [ 467] 0 467 7469 2 20 3 269 0 wickedd-auto4 [ 2119.805054] [ 478] 0 478 7500 1 20 3 312 0 wickedd [ 2119.806615] [ 512] 0 512 7476 0 20 3 277 0 wickedd-nanny [ 2119.808263] [ 517] 0 517 84318 115 39 3 306 0 rsyslogd [ 2119.809845] [ 521] 0 521 34903 526 68 3 331 0 sssd_be [ 2119.811408] [ 531] 0 531 26453 553 56 3 234 0 sssd_nss [ 2119.813008] [ 532] 0 532 27004 135 56 3 226 0 sssd_pam [ 2119.814598] [ 533] 0 533 25887 136 54 4 209 0 sssd_ssh [ 2119.816188] [ 1219] 0 1219 2141 265 10 3 41 0 xinetd [ 2119.817729] [ 1247] 0 1247 16601 1 37 3 179 -1000 sshd [ 2119.819244] [ 1250] 74 1250 5882 327 17 3 162 0 ntpd [ 2119.820752] [ 1253] 74 1253 6935 1 18 3 153 0 ntpd [ 2119.822279] [ 1276] 493 1276 55367 396 19 3 245 0 munged [ 2119.823848] [ 1336] 0 1336 163871 340 62 4 374 0 automount [ 2119.825490] [ 1370] 0 1370 1665 1 9 3 27 0 agetty [ 2119.827047] [ 1372] 0 1372 1665 1 9 3 29 0 agetty [ 2119.828599] [ 1401] 0 1401 5514 1 16 3 80 0 systemd-logind [ 2119.830287] [ 1575] 0 1575 8863 58 21 3 127 0 master [ 2119.831840] [ 1588] 51 1588 9900 189 24 3 109 0 pickup [ 2119.833401] [ 1589] 51 1589 9997 212 23 3 169 0 qmgr [ 2119.834921] [ 1613] 0 1613 5198 273 15 3 153 0 cron [ 2119.836474] [19713] 0 19713 14926 1 35 3 175 0 in.mrshd [ 2119.838057] [19714] 0 19714 2894 0 11 3 78 0 bash [ 2119.839572] [19719] 0 19719 2894 0 11 3 79 0 bash [ 2119.841100] [19720] 0 19720 3034 356 11 3 215 0 run_dd.sh [ 2119.842724] [21185] 0 21185 1062 300 8 3 33 0 dd [ 2119.844231] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled [ 2119.844231] |
| Comment by Alena Nikitenko [ 03/Dec/21 ] |
|
Similar oom, but on CentOS 7.9 in recovery-random-scale test set on 2.12.8: https://testing.whamcloud.com/test_sets/22e46ed4-50a5-4a25-b830-c798ce17b9e6 [ 872.358394] Lustre: DEBUG MARKER: /usr/sbin/lctl mark onyx-124vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 [ 874.874723] Lustre: DEBUG MARKER: onyx-124vm3.onyx.whamcloud.com: executing _wait_recovery_complete *.lustre-MDT0000.recovery_status 1475 [ 911.214431] ntpd invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [ 911.225802] ntpd cpuset=/ mems_allowed=0 [ 911.226442] CPU: 1 PID: 496 Comm: ntpd Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.45.1.el7.x86_64 #1 [ 911.228117] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 911.228997] Call Trace: [ 911.229462] [<ffffffff81b83539>] dump_stack+0x19/0x1b [ 911.230260] [<ffffffff81b7e5d8>] dump_header+0x90/0x229 [ 911.231088] [<ffffffff81b90b6f>] ? notifier_call_chain+0x4f/0x70 [ 911.232048] [<ffffffff814cc228>] ? __blocking_notifier_call_chain+0x58/0x70 [ 911.233143] [<ffffffff815c273e>] check_panic_on_oom+0x2e/0x60 [ 911.234044] [<ffffffff815c2ab4>] out_of_memory+0x194/0x500 [ 911.234908] [<ffffffff815c9854>] __alloc_pages_nodemask+0xad4/0xbe0 [ 911.235890] [<ffffffff8161cc49>] alloc_pages_vma+0xa9/0x200 [ 911.236772] [<ffffffff8160a1e5>] __read_swap_cache_async+0x115/0x190 [ 911.237758] [<ffffffff8160a286>] read_swap_cache_async+0x26/0x60 [ 911.238699] [<ffffffff8160a46b>] swapin_readahead+0x1ab/0x210 [ 911.239621] [<ffffffff8178dcd2>] ? radix_tree_lookup_slot+0x22/0x50 [ 911.240604] [<ffffffff815bd91e>] ? __find_get_page+0x1e/0xa0 [ 911.241495] [<ffffffff815f288f>] do_swap_page+0x23f/0x7c0 [ 911.242366] [<ffffffff816655dd>] ? core_sys_select+0x26d/0x340 [ 911.243283] [<ffffffff815f6627>] handle_mm_fault+0xaa7/0xfb0 [ 911.244178] [<ffffffff81629015>] ? kmem_cache_alloc+0x35/0x1f0 [ 911.245107] [<ffffffff81a39b99>] ? sk_prot_alloc+0x39/0x190 [ 911.245978] [<ffffffff81b90653>] __do_page_fault+0x213/0x500 [ 911.246866] [<ffffffff81b90a26>] trace_do_page_fault+0x56/0x150 [ 911.247791] [<ffffffff81b8ffa2>] do_async_page_fault+0x22/0xf0 [ 911.248698] [<ffffffff81b8c7a8>] async_page_fault+0x28/0x30 [ 911.249569] Mem-Info: [ 911.249937] active_anon:1722 inactive_anon:1749 isolated_anon:0 [ 911.249937] active_file:54406 inactive_file:579983 isolated_file:64 [ 911.249937] unevictable:0 dirty:0 writeback:0 unstable:0 [ 911.249937] slab_reclaimable:3354 slab_unreclaimable:5635 [ 911.249937] mapped:5514 shmem:2168 pagetables:1086 bounce:0 [ 911.249937] free:13926 free_pcp:19 free_cma:0 [ 911.254845] Node 0 DMA free:10912kB min:260kB low:324kB high:388kB active_anon:36kB inactive_anon:84kB active_file:256kB inactive_file:3876kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:88kB shmem:84kB slab_reclaimable:56kB slab_unreclaimable:84kB kernel_stack:48kB pagetables:60kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:7171 all_unreclaimable? yes [ 911.261249] lowmem_reserve[]: 0 2668 2668 2668 [ 911.262095] Node 0 DMA32 free:44792kB min:44792kB low:55988kB high:67188kB active_anon:6852kB inactive_anon:6912kB active_file:217368kB inactive_file:2316156kB unevictable:0kB isolated(anon):0kB isolated(file):128kB present:3129320kB managed:2735424kB mlocked:0kB dirty:0kB writeback:0kB mapped:21968kB shmem:8588kB slab_reclaimable:13360kB slab_unreclaimable:22456kB kernel_stack:2432kB pagetables:4284kB unstable:0kB bounce:0kB free_pcp:76kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:5594703 all_unreclaimable? yes [ 911.269030] lowmem_reserve[]: 0 0 0 0 [ 911.269765] Node 0 DMA: 4*4kB (UE) 4*8kB (UE) 5*16kB (UEM) 5*32kB (UE) 4*64kB (UM) 1*128kB (E) 2*256kB (EM) 3*512kB (UEM) 2*1024kB (EM) 1*2048kB (E) 1*4096kB (M) = 10912kB [ 911.272831] Node 0 DMA32: 243*4kB (UEM) 378*8kB (UEM) 523*16kB (UE) 291*32kB (UEM) 152*64kB (UEM) 86*128kB (UEM) 9*256kB (UM) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44716kB [ 911.275753] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 911.277073] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 911.278353] 14131 total pagecache pages [ 911.278950] 538 pages in swap cache [ 911.279502] Swap cache stats: add 11707, delete 11169, find 1362/1879 [ 911.280485] Free swap = 2711036kB [ 911.281018] Total swap = 2753532kB [ 911.281555] 786328 pages RAM [ 911.282006] 0 pages HighMem/MovableOnly [ 911.282605] 98495 pages reserved [ 911.283106] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 911.284298] [ 346] 0 346 9860 774 24 59 0 systemd-journal [ 911.285626] [ 367] 0 367 29161 241 27 78 0 lvmetad [ 911.286856] [ 370] 0 370 11413 370 23 121 -1000 systemd-udevd [ 911.288154] [ 463] 0 463 13883 161 28 100 -1000 auditd [ 911.289377] [ 489] 0 489 6596 398 19 42 0 systemd-logind [ 911.290687] [ 492] 999 492 153058 1234 62 1843 0 polkitd [ 911.291910] [ 493] 81 493 14560 529 32 94 -900 dbus-daemon [ 911.293188] [ 495] 32 495 17314 208 38 140 0 rpcbind [ 911.294413] [ 496] 38 496 11825 457 29 153 0 ntpd [ 911.295599] [ 500] 0 500 118583 1821 87 867 0 NetworkManager [ 911.296923] [ 501] 0 501 5385 272 16 41 0 irqbalance [ 911.298199] [ 508] 0 508 48801 195 36 130 0 gssproxy [ 911.299444] [ 839] 0 839 28246 840 58 259 -1000 sshd [ 911.300640] [ 841] 0 841 143570 1595 99 2726 0 tuned [ 911.301835] [ 847] 0 847 54100 742 42 673 0 rsyslogd [ 911.303075] [ 848] 997 848 56473 403 22 128 0 munged [ 911.304286] [ 859] 0 859 6792 196 19 63 0 xinetd [ 911.305505] [ 861] 29 861 10610 222 26 209 0 rpc.statd [ 911.306756] [ 912] 0 912 155891 1007 79 907 0 automount [ 911.308001] [ 921] 0 921 31595 240 21 154 0 crond [ 911.309199] [ 927] 0 927 6477 189 18 52 0 atd [ 911.310380] [ 941] 0 941 27551 181 10 33 0 agetty [ 911.311605] [ 942] 0 942 27551 184 11 32 0 agetty [ 911.312827] [ 1247] 0 1247 22447 282 44 256 0 master [ 911.314040] [ 1258] 89 1258 22473 747 44 251 0 pickup [ 911.315249] [ 1259] 89 1259 22490 751 44 253 0 qmgr [ 911.316452] [23060] 0 23060 21124 457 48 206 0 in.mrshd [ 911.317691] [23065] 0 23065 28320 334 13 70 0 bash [ 911.318885] [23129] 0 23129 28320 97 11 71 0 bash [ 911.320078] [23130] 0 23130 28390 391 14 75 0 run_dd.sh [ 911.321331] [24014] 0 24014 27024 155 12 0 0 dd [ 911.322500] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled [ 911.322500] [ 911.323962] CPU: 1 PID: 496 Comm: ntpd Kdump: loaded Tainted: G OE ------------ 3.10.0-1160.45.1.el7.x86_64 #1 [ 911.325626] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 911.326494] Call Trace: [ 911.326882] [<ffffffff81b83539>] dump_stack+0x19/0x1b [ 911.327670] [<ffffffff81b7d241>] panic+0xe8/0x21f [ 911.328407] [<ffffffff815c2765>] check_panic_on_oom+0x55/0x60 [ 911.329288] [<ffffffff815c2ab4>] out_of_memory+0x194/0x500 [ 911.330134] [<ffffffff815c9854>] __alloc_pages_nodemask+0xad4/0xbe0 [ 911.331094] [<ffffffff8161cc49>] alloc_pages_vma+0xa9/0x200 [ 911.331963] [<ffffffff8160a1e5>] __read_swap_cache_async+0x115/0x190 [ 911.332935] [<ffffffff8160a286>] read_swap_cache_async+0x26/0x60 [ 911.333860] [<ffffffff8160a46b>] swapin_readahead+0x1ab/0x210 [ 911.334750] [<ffffffff8178dcd2>] ? radix_tree_lookup_slot+0x22/0x50 [ 911.335712] [<ffffffff815bd91e>] ? __find_get_page+0x1e/0xa0 [ 911.336580] [<ffffffff815f288f>] do_swap_page+0x23f/0x7c0 [ 911.337416] [<ffffffff816655dd>] ? core_sys_select+0x26d/0x340 [ 911.338308] [<ffffffff815f6627>] handle_mm_fault+0xaa7/0xfb0 [ 911.339181] [<ffffffff81629015>] ? kmem_cache_alloc+0x35/0x1f0 [ 911.340088] [<ffffffff81a39b99>] ? sk_prot_alloc+0x39/0x190 [ 911.340948] [<ffffffff81b90653>] __do_page_fault+0x213/0x500 [ 911.341821] [<ffffffff81b90a26>] trace_do_page_fault+0x56/0x150 [ 911.342732] [<ffffffff81b8ffa2>] do_async_page_fault+0x22/0xf0 [ 911.343636] [<ffffffff81b8c7a8>] async_page_fault+0x28/0x30 |
| Comment by Sarah Liu [ 15/Jun/22 ] |
|
+2 |