[LU-16919] sanity-benchmark test_bonnie: OOM on client Created: 20/Jun/23  Updated: 05/Dec/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.3, Lustre 2.15.4
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Minh Diep <mdiep@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/33c950d4-1523-41e9-966d-048501b82b9b

test_bonnie failed with the following error:

onyx-118vm3 crashed during sanity-benchmark test_bonnie

Test session details:
clients: https://build.whamcloud.com/job/lustre-b2_15/65 - 4.18.0-425.3.1.el8.x86_64
servers: https://build.whamcloud.com/job/lustre-b2_15/65 - 4.18.0-425.3.1.el8_lustre.x86_64

<<Please provide additional information about the failure here>>
[ 7632.940007] avahi-daemon invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ 7632.942040] CPU: 0 PID: 762 Comm: avahi-daemon Kdump: loaded Tainted: P OE --------- - - 4.18.0-425.3.1.el8_lustre.x86_64 #1
[ 7632.944767] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 7632.945951] Call Trace:
[ 7632.946611] dump_stack+0x41/0x60
[ 7632.947365] dump_header+0x4a/0x1df
[ 7632.948138] out_of_memory.cold.36+0xa/0x85
[ 7632.949002] __alloc_pages_slowpath+0xc24/0xd10
[ 7632.949958] __alloc_pages_nodemask+0x2e2/0x320
[ 7632.950889] alloc_pages_vma+0x74/0x1d0
[ 7632.951715] __wp_page_copy+0x7c/0x540
[ 7632.952520] ? page_trans_huge_map_swapcount+0x19e/0x250
[ 7632.954092] do_wp_page+0xed/0x350
[ 7632.954818] __handle_mm_fault+0x453/0x6c0
[ 7632.955671] ? pipe_read+0x2a5/0x2d0
[ 7632.956620] handle_mm_fault+0xc1/0x1e0
[ 7632.957417] do_user_addr_fault+0x1b9/0x450
[ 7632.958311] do_page_fault+0x37/0x130
[ 7632.959080] ? page_fault+0x8/0x30
[ 7632.960019] page_fault+0x1e/0x30

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-benchmark test_bonnie - onyx-118vm3 crashed during sanity-benchmark test_bonnie



 Comments   
Comment by Sarah Liu [ 05/Dec/23 ]

same test OOM on OSS https://testing.whamcloud.com/test_sets/ca8a1a77-ebd8-496f-9281-47ebf25be0c7

[ 7250.833861] obd_memory max: 291154091, obd_memory current: 181623987
[ 7250.836215] obd_memory max: 291154091, obd_memory current: 181623987
[ 7250.838064] obd_memory max: 291154091, obd_memory current: 181623987
[ 7250.839644] obd_memory max: 291154091, obd_memory current: 181623987
[ 7250.840910] chronyd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ 7250.842725] CPU: 0 PID: 607 Comm: chronyd Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-513.5.1.el8_lustre.x86_64 #1
[ 7250.845005] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 7250.846128] Call Trace:
[ 7250.846713]  dump_stack+0x41/0x60
[ 7250.847464]  dump_header+0x4a/0x1df
[ 7250.848229]  out_of_memory.cold.36+0xa/0x7e
[ 7250.849076]  __alloc_pages_slowpath+0xbf0/0xcd0
[ 7250.850004]  __alloc_pages_nodemask+0x2e2/0x330
[ 7250.850911]  pagecache_get_page+0xce/0x310
[ 7250.851757]  filemap_fault+0x6c8/0xa30
[ 7250.852528]  ? __mod_lruvec_page_state+0x5e/0x80
[ 7250.853472]  ? page_add_file_rmap+0x99/0x150
[ 7250.854331]  ? alloc_set_pte+0xb8/0x3e0
[ 7250.855118]  ? xas_load+0x8/0x80
[ 7250.855790]  ? xas_find+0x183/0x1c0
[ 7250.856508]  ? filemap_map_pages+0x271/0x410
[ 7250.857381]  ext4_filemap_fault+0x2c/0x40 [ext4]
[ 7250.858472]  __do_fault+0x38/0xc0
[ 7250.859178]  handle_pte_fault+0x55d/0x880
[ 7250.859995]  ? __raw_spin_unlock+0x5/0x10
[ 7250.860791]  __handle_mm_fault+0x552/0x6d0
[ 7250.861626]  handle_mm_fault+0xca/0x2a0
[ 7250.862414]  __do_page_fault+0x1f0/0x460
[ 7250.863240]  do_page_fault+0x37/0x130
[ 7250.863992]  ? page_fault+0x8/0x30
[ 7250.864698]  page_fault+0x1e/0x30
[ 7250.865398] RIP: 0033:0x55a3368fcb80
[ 7250.866141] Code: Unable to access opcode bytes at RIP 0x55a3368fcb56.
[ 7250.867406] RSP: 002b:00007ffd44daaf28 EFLAGS: 00010206
[ 7250.868422] RAX: 0000000000000000 RBX: 00007ffd44daafd0 RCX: 00007fa290774a3f
[ 7250.869781] RDX: 0000000000000000 RSI: 00007ffd44dab020 RDI: 0000000000000000
[ 7250.871164] RBP: 0000000000000000 R08: 00007ffd44daaf80 R09: 0000000000000000
[ 7250.872530] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 7250.873900] R13: 00007ffd44daafb0 R14: 0000000000000000 R15: 00007ffd44dab020
[ 7250.875282] Mem-Info:
[ 7250.875788] active_anon:16 inactive_anon:0 isolated_anon:1
 active_file:22 inactive_file:33 isolated_file:0
 unevictable:0 dirty:0 writeback:13
 slab_reclaimable:4789 slab_unreclaimable:68192
 mapped:259 shmem:0 pagetables:1558 bounce:0
 free:18245 free_pcp:12 free_cma:0
[ 7250.881443] Node 0 active_anon:64kB inactive_anon:0kB active_file:88kB inactive_file:132kB unevictable:0kB isolated(anon):4kB isolated(file):0kB mapped:1036kB dirty:0kB writeback:52kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:9376kB pagetables:6232kB all_unreclaimable? yes
[ 7250.886687] Node 0 DMA free:10636kB min:256kB low:320kB high:384kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 7250.890959] lowmem_reserve[]: 0 2597 2597 2597 2597
[ 7250.891940] Node 0 DMA32 free:62344kB min:44796kB low:55992kB high:67188kB active_anon:164kB inactive_anon:0kB active_file:136kB inactive_file:28kB unevictable:0kB writepending:52kB present:3129320kB managed:2694556kB mlocked:0kB bounce:0kB free_pcp:48kB local_pcp:36kB free_cma:0kB
[ 7250.896533] lowmem_reserve[]: 0 0 0 0 0
[ 7250.897325] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 0*64kB 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 1*2048kB (M) 2*4096kB (M) = 10636kB
[ 7250.899716] Node 0 DMA32: 182*4kB (MEH) 164*8kB (ME) 179*16kB (MEH) 111*32kB (MEH) 29*64kB (MEH) 8*128kB (M) 13*256kB (UM) 21*512kB (UMH) 26*1024kB (UM) 5*2048kB (U) 0*4096kB = 62280kB
[ 7250.902791] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[ 7250.904473] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 7250.906095] 54 total pagecache pages
[ 7250.906825] 13 pages in swap cache
[ 7250.907529] Swap cache stats: add 16697, delete 16684, find 408/1025
[ 7250.908786] Free swap  = 2779796kB
[ 7250.909498] Total swap = 2841596kB
[ 7250.910212] 786328 pages RAM
[ 7250.910816] 0 pages HighMem/MovableOnly
[ 7250.911597] 108849 pages reserved
[ 7250.912292] 0 pages hwpoisoned
[ 7250.912946] Unreclaimable slab info:
[ 7250.913685] Name                      Used          Total
[ 7250.914752] ofd_obj                   56KB         84KB
Generated at Sat Feb 10 03:31:05 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.