Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.13.0
-
RHEL 8.1 client + RHEL 7.7 server
-
3
-
9223372036854775807
Description
This issue was created by maloo for jianyu <yujian@whamcloud.com>
This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0925c456-3ba8-11ea-bb75-52540065bddc
test_statahead failed with the following error:
+ su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh --oversubscribe -machinefile /tmp/auster.machines -np 64 /usr/lib64/openmpi/bin/mdsrate --mknod --dir /mnt/lustre/dstatahead --nfiles 160711 --filefmt 'f%%d' " [1579521814.463727] [trevis-12vm6:7395 :0] cpu.c:52 UCX WARN CPU does not support invariant TSC, time may be unstable [1579522133.063761] [trevis-12vm7:26045:0] cpu.c:52 UCX WARN CPU does not support invariant TSC, time may be unstable
Clients crashed:
[69145.465824] Lustre: DEBUG MARKER: == parallel-scale test statahead: statahead test, multiple clients =================================== 12:03:31 (1579521811) [69145.622411] Lustre: lustre-OST0000-osc-ffff9d88618a4800: reconnect after 7127s idle [69166.377333] Lustre: lustre-OST0000-osc-ffff9d88618a4800: disconnect after 21s idle [69173.846404] mdsrate invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0 [69173.848578] mdsrate cpuset=/ mems_allowed=0 [69173.849368] CPU: 1 PID: 7399 Comm: mdsrate Kdump: loaded Tainted: G OE --------- - - 4.18.0-147.3.1.el8_1.x86_64 #1 [69173.851539] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [69173.852562] Call Trace: [69173.853154] dump_stack+0x5c/0x80 [69173.853821] dump_header+0x6e/0x27a [69173.854515] ? notifier_call_chain+0x47/0x70 [69173.855409] out_of_memory.cold.32+0xa/0x80 [69173.856169] __alloc_pages_slowpath+0xc0f/0xce0 [69173.856982] __alloc_pages_nodemask+0x245/0x280 [69173.857837] alloc_pages_vma+0x74/0x1d0 [69173.858574] do_anonymous_page+0x90/0x370 [69173.859325] __handle_mm_fault+0x66e/0x6b0 [69173.860069] handle_mm_fault+0xda/0x200 [69173.860764] __get_user_pages+0x255/0x7c0 [69173.861541] ? _cond_resched+0x15/0x30 [69173.862230] get_user_pages+0x3e/0x50 [69173.862898] get_user_pages_longterm+0x34/0x190 [69173.863772] ib_umem_get+0x2ee/0x520 [ib_core] [69173.864602] mlx4_ib_reg_user_mr+0x71/0x1e0 [mlx4_ib] [69173.865519] ib_uverbs_reg_mr+0x143/0x240 [ib_uverbs] [69173.866428] ? __blk_mq_run_hw_queue+0x51/0xd0 [69173.867218] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs] [69173.868475] ib_uverbs_run_method+0x20c/0x7a0 [ib_uverbs] [69173.869438] ? __switch_to_asm+0x35/0x70 [69173.870146] ? uverbs_disassociate_api+0x100/0x100 [ib_uverbs] [69173.871155] ? __switch_to_asm+0x41/0x70 [69173.871858] ? __switch_to_asm+0x35/0x70 [69173.872566] ib_uverbs_cmd_verbs+0x189/0x380 [ib_uverbs] [69173.873490] ? __switch_to_asm+0x41/0x70 [69173.874196] ? __switch_to_asm+0x35/0x70 [69173.874898] ? __switch_to_asm+0x41/0x70 [69173.875600] ? __switch_to_asm+0x35/0x70 [69173.876315] ? __switch_to+0x115/0x480 [69173.876999] ? finish_task_switch+0x76/0x2b0 [69173.877765] ? free_swap_slot+0x9a/0xf0 [69173.878446] ? wp_page_reuse+0x4d/0x60 [69173.879128] ? __raw_spin_unlock+0x5/0x10 [69173.879844] ? do_wp_page+0x217/0x310 [69173.880501] ? __handle_mm_fault+0x67e/0x6b0 [69173.881267] ib_uverbs_ioctl+0xa3/0x100 [ib_uverbs] [69173.882139] do_vfs_ioctl+0xa4/0x630 [69173.882806] ? __x64_sys_madvise+0x4a6/0x790 [69173.883573] ? syscall_trace_enter+0x1d3/0x2c0 [69173.884356] ksys_ioctl+0x60/0x90 [69173.884963] __x64_sys_ioctl+0x16/0x20 [69173.885640] do_syscall_64+0x5b/0x1b0 [69173.886311] entry_SYSCALL_64_after_hwframe+0x65/0xca
VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
parallel-scale test_statahead - trevis-12vm6, trevis-12vm7 crashed during parallel-scale test_statahead
Attachments
Issue Links
- is related to
-
LU-12830 RHEL8.3 and ZFS: oom on OSS
- Resolved