Details
-
Bug
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
3
-
9223372036854775807
Description
This issue was created by maloo for Andreas Dilger <adilger@dilger.ca>
This issue relates to the following test suite run:
https://testing.whamcloud.com/test_sets/8a6b8fff-665e-4b0e-9123-efe7458b40d4
test_428 failed with the following error, which started on 2026-01-08 almost exclusively on aarch64+64k config:
onyx-154vm5 crashed during sanity test_428
Test session details:
clients: https://build.whamcloud.com/job/lustre-reviews/120304 - 5.14.0-503.40.1.el9_5.aarch64+64k
servers: https://build.whamcloud.com/job/lustre-reviews/120304 - 4.18.0-553.89.1.el8_lustre.x86_64
Memory info at the time of OOM shows it is mostly file page cache (800MB active, 1800MB inactive) consuming the 2650 MB of VM managed memory:
12293.826210] Mem-Info: [12293.826350] active_anon:2961 inactive_anon:4173 isolated_anon:0 active_file:28684 inactive_file:12612 isolated_file:0 unevictable:0 dirty:17 writeback:23 slab_reclaimable:450 slab_unreclaimable:1560 mapped:13 shmem:19 pagetables:276 sec_pagetables:0 bounce:0 kernel_misc_reclaimable:0 free:2769 free_pcp:2 free_cma:0 [12293.828539] Node 0 active_anon:273664kB inactive_anon:188608kB active_file:799232kB inactive_file:1836416kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:832kB dirty:1088kB writeback:1472kB shmem:1216kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:11264kB pagetables:17664kB sec_pagetables:0kB all_unreclaimable? no [12293.830368] Node 0 DMA free:131712kB boost:0kB min:131904kB low:164864kB high:197824kB reserved_highatomic:0KB active_anon:172480kB inactive_anon:57344kB active_file:1139200kB inactive_file:1097728kB unevictable:0kB writepending:0kB present:3145728kB managed:2650112kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [12293.832090] lowmem_reserve[]: 0 0 55 55 55 [12293.832331] Node 0 Normal free:45504kB boost:0kB min:45568kB low:56960kB high:68352kB reserved_highatomic:0KB active_anon:187584kB inactive_anon:44864kB active_file:204032kB inactive_file:189824kB unevictable:0kB writepending:960kB present:1048576kB managed:951808kB mlocked:0kB bounce:0kB free_pcp:128kB local_pcp:0kB free_cma:0kB [12293.834010] lowmem_reserve[]: 0 0 0 0 0 [12293.834235] Node 0 DMA: 8*64kB (M) 24*128kB (UM) 135*256kB (M) 86*512kB (M) 23*1024kB (M) 12*2048kB (M) 2*4096kB (M) 0*8192kB 0*16384kB 0*32768kB 0*65536kB 0*131072kB 0*262144kB 0*524288kB = 138496kB [12293.835252] Node 0 Normal: 355*64kB (UM) 162*128kB (UM) 35*256kB (UM) 3*512kB (U) 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB 0*32768kB 0*65536kB 0*131072kB 0*262144kB 0*524288kB = 53952kB [12293.836216] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB [12293.836719] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB [12293.837210] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [12293.837692] 7265 total pagecache pages [12293.837910] 7229 pages in swap cache [12293.838117] Free swap = 1879808kB [12293.838314] Total swap = 2621376kB [12293.838511] 65536 pages RAM [12293.838676] 0 pages HighMem/MovableOnly [12293.838902] 9256 pages reserved [12293.839082] 0 pages cma reserved [12293.839267] 0 pages hwpoisoned
The following OOM stack trace was dumped:
[12292.565618] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity test 428: large block size IO should not hang == 09:50:14 \(1768211414\) [12292.726677] Lustre: DEBUG MARKER: == sanity test 428: large block size IO should not hang == 09:50:14 (1768211414) [12293.743718] obd_memory max: 109256080, obd_memory current: 8395708 [12293.816509] obd_memory max: 109256080, obd_memory current: 8395708 [12293.816930] NetworkManager invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0 [12293.817572] CPU: 1 PID: 524 Comm: NetworkManager Kdump: loaded Tainted: G OE ------- --- 5.14.0-503.40.1.el9_5.aarch64+64k #1 [12293.818297] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 [12293.818693] Call trace: [12293.818837] dump_backtrace+0xa8/0x120 [12293.819060] show_stack+0x1c/0x30 [12293.819258] dump_stack_lvl+0x74/0x8c [12293.819475] dump_stack+0x14/0x24 [12293.819674] dump_header+0x4c/0x210 [12293.819877] out_of_memory+0x254/0x350 [12293.820099] __alloc_pages_may_oom+0x118/0x1bc [12293.820359] __alloc_pages_slowpath.constprop.0+0x50c/0x94c [12293.820684] __alloc_pages+0x234/0x290 [12293.820910] __folio_alloc+0x20/0x54 [12293.821119] vma_alloc_folio+0xbc/0x350 [12293.821353] __read_swap_cache_async+0x158/0x2c0 [12293.821638] swap_cluster_readahead+0x148/0x39c [12293.821914] swapin_readahead+0x48/0xc0 [12293.822151] do_swap_page+0x3a4/0xa7c [12293.822383] handle_pte_fault+0x118/0x150 [12293.822641] __handle_mm_fault+0x110/0x320 [12293.822894] handle_mm_fault+0xe0/0x320 [12293.823131] do_page_fault+0x1ec/0x4a0 [12293.823370] do_translation_fault+0x38/0x60 [12293.823632] do_mem_abort+0x48/0x94 [12293.823852] el1_abort+0x38/0x90 [12293.824054] el1h_64_sync_handler+0xc0/0xd0 [12293.824311] el1h_64_sync+0x78/0x7c [12293.824530] do_sys_poll+0x234/0x2e0
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_428 - onyx-154vm5 crashed during sanity test_428