Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.1
-
3
-
9223372036854775807
Description
recovery-random-scale test_fail_client_mds fails with 'test_fail_client_mds returned 4'. So far, we are only seeing this for SLES12 SP4 failover testing.
Looking the suite_log for a recent failure with logs at https://testing.whamcloud.com/test_sets/b92c01d0-6692-11e9-8bb1-52540065bddc , we see there is a problem with one of the clients
Client load failed on node trevis-35vm3, rc=1 2019-04-23 18:40:10 Terminating clients loads ... Duration: 86400 Server failover period: 1200 seconds Exited after: 2703 seconds Number of failovers before exit: mds1 failed over 3 times Status: FAIL: rc=4
Looking at the client (vm3) console log, we see
[ 98.357170] Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK [ 98.717424] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Number of failovers: [ 98.717424] mds1 failed over 3 times and counting... [ 98.924613] Lustre: DEBUG MARKER: Number of failovers: [ 704.539916] ptlrpcd_00_00: page allocation stalls for 10832ms, order:2, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null) [ 704.542403] ptlrpcd_00_00 cpuset=/ mems_allowed=0 [ 704.543320] CPU: 0 PID: 1542 Comm: ptlrpcd_00_00 Tainted: G OE 4.12.14-95.13-default #1 SLE12-SP4 [ 704.545126] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 704.546204] Call Trace: [ 704.546821] dump_stack+0x5a/0x75 [ 704.547537] warn_alloc+0xf0/0x190 [ 704.548281] __alloc_pages_slowpath+0x3a7/0xa0d [ 704.549166] __alloc_pages_nodemask+0x1e9/0x210 [ 704.550050] cache_grow_begin+0x85/0x560 [ 704.550848] fallback_alloc+0x167/0x1f0 [ 704.551905] ? ptlrpc_new_bulk+0x442/0x850 [ptlrpc] [ 704.552848] __kmalloc+0x1b4/0x210 [ 704.553588] ptlrpc_new_bulk+0x442/0x850 [ptlrpc] [ 704.554536] ptlrpc_prep_bulk_imp+0x58/0x190 [ptlrpc] [ 704.555599] osc_brw_prep_request+0x2de/0x1360 [osc] [ 704.556757] ? class_handle2object+0x90/0x170 [obdclass] [ 704.557776] ? __ldlm_handle2lock+0x36/0x3a0 [ptlrpc] [ 704.558748] ? osc_req_attr_set+0x10b/0x5b0 [osc] [ 704.559662] osc_build_rpc+0x4e7/0x1030 [osc] [ 704.560552] osc_send_write_rpc+0x3f2/0x9a0 [osc] [ 704.561488] ? __wake_up_common_lock+0x77/0x90 [ 704.562481] ? vvp_page_completion_write+0xff/0x350 [lustre] [ 704.563617] ? cfs_hash_bd_from_key+0x31/0xa0 [libcfs] [ 704.564597] osc_io_unplug0+0x381/0x12b0 [osc] [ 704.565466] ? osc_extent_finish+0x657/0xba0 [osc] [ 704.566400] ? osc_brw_fini_request+0x72f/0x1790 [osc] [ 704.567403] ? at_measured+0x18f/0x310 [ptlrpc] [ 704.568303] ? __switch_to_asm+0x40/0x70 [ 704.569081] brw_interpret+0x2e5/0xe20 [osc] [ 704.569936] ? finish_task_switch+0x76/0x260 [ 704.570788] ptlrpc_check_set.part.25+0x459/0x1da0 [ptlrpc] [ 704.571860] ptlrpcd_check+0x3c1/0x560 [ptlrpc] [ 704.572760] ? call_timer_fn+0x140/0x140 [ 704.573558] ptlrpcd+0x23d/0x4a0 [ptlrpc] [ 704.574360] ? wake_up_q+0x70/0x70 [ 704.575066] kthread+0xff/0x140 [ 704.575748] ? ptlrpcd_check+0x560/0x560 [ptlrpc] [ 704.576672] ? __kthread_parkme+0x70/0x70 [ 704.577461] ret_from_fork+0x35/0x40 [ 704.578228] Mem-Info: [ 704.578761] active_anon:0 inactive_anon:32 isolated_anon:0 [ 704.578761] active_file:88761 inactive_file:339637 isolated_file:320 [ 704.578761] unevictable:20 dirty:2961 writeback:64814 unstable:0 [ 704.578761] slab_reclaimable:3423 slab_unreclaimable:16689 [ 704.578761] mapped:6882 shmem:0 pagetables:945 bounce:0 [ 704.578761] free:12999 free_pcp:302 free_cma:0 [ 704.584496] Node 0 active_anon:0kB inactive_anon:128kB active_file:355044kB inactive_file:1358632kB unevictable:80kB isolated(anon):0kB isolated(file):1152kB mapped:27528kB dirty:11844kB writeback:259256kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 704.589280] Node 0 DMA free:7660kB min:380kB low:472kB high:564kB active_anon:0kB inactive_anon:0kB active_file:1776kB inactive_file:6164kB unevictable:0kB writepending:1776kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:4kB slab_unreclaimable:248kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 704.594509] lowmem_reserve[]: 0 1822 1822 1822 1822 [ 704.595458] Node 0 DMA32 free:44336kB min:44672kB low:55840kB high:67008kB active_anon:180kB inactive_anon:160kB active_file:353268kB inactive_file:1352868kB unevictable:80kB writepending:269200kB present:2080744kB managed:1885860kB mlocked:80kB slab_reclaimable:13688kB slab_unreclaimable:66508kB kernel_stack:2160kB pagetables:3780kB bounce:0kB free_pcp:1208kB local_pcp:756kB free_cma:0kB [ 704.601265] lowmem_reserve[]: 0 0 0 0 0 [ 704.602025] Node 0 DMA: 3*4kB (UME) 2*8kB (UE) 1*16kB (E) 2*32kB (UE) 4*64kB (UME) 5*128kB (UME) 2*256kB (UE) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 7660kB [ 704.604659] Node 0 DMA32: 256*4kB (ME) 64*8kB (UME) 261*16kB (UME) 452*32kB (UME) 238*64kB (UM) 70*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44368kB [ 704.607241] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 704.608828] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 704.610360] 76961 total pagecache pages [ 704.611146] 38 pages in swap cache [ 704.611847] Swap cache stats: add 12242, delete 12204, find 1904/2998 [ 704.613042] Free swap = 14298108kB [ 704.613748] Total swap = 14338044kB [ 704.614463] 524184 pages RAM [ 704.615061] 0 pages HighMem/MovableOnly [ 704.615852] 48742 pages reserved [ 704.616515] 0 pages hwpoisoned [ 706.881305] kthreadd: page allocation stalls for 10420ms, order:2, mode:0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=(null) [ 706.891587] kthreadd cpuset=/ mems_allowed=0 [ 706.892439] CPU: 0 PID: 2 Comm: kthreadd Tainted: G OE 4.12.14-95.13-default #1 SLE12-SP4 [ 706.894135] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 706.895203] Call Trace: [ 706.895737] dump_stack+0x5a/0x75 [ 706.896415] warn_alloc+0xf0/0x190 [ 706.897122] __alloc_pages_slowpath+0x3a7/0xa0d [ 706.897989] __alloc_pages_nodemask+0x1e9/0x210 [ 706.898862] cache_grow_begin+0x85/0x560 [ 706.899632] fallback_alloc+0x167/0x1f0 [ 706.900386] kmem_cache_alloc_node+0x84/0x1f0 [ 706.901229] ? __switch_to_asm+0x40/0x70 [ 706.902006] copy_process.part.36+0xef/0x1d00 [ 706.902864] ? __switch_to_asm+0x40/0x70 [ 706.903642] ? __switch_to_asm+0x34/0x70 [ 706.904414] ? __switch_to_asm+0x40/0x70 [ 706.905182] ? __switch_to_asm+0x34/0x70 [ 706.905940] ? __switch_to_asm+0x40/0x70 [ 706.906712] ? __switch_to_asm+0x34/0x70 [ 706.907483] ? __switch_to_asm+0x40/0x70 [ 706.908284] ? __switch_to_asm+0x34/0x70 [ 706.909055] ? __switch_to_asm+0x40/0x70 [ 706.909833] ? __switch_to_asm+0x34/0x70 [ 706.910604] ? __switch_to_asm+0x40/0x70 [ 706.911375] ? __switch_to_asm+0x34/0x70 [ 706.912141] ? __switch_to_asm+0x40/0x70 [ 706.912910] ? __switch_to_asm+0x40/0x70 [ 706.913692] ? __kthread_parkme+0x70/0x70 [ 706.914485] ? __switch_to+0x7c/0x4a0 [ 706.915215] _do_fork+0xdd/0x360 [ 706.915872] kernel_thread+0x25/0x30 [ 706.916621] kthreadd+0x2d4/0x320 [ 706.917301] ? kthread_create_on_cpu+0x80/0x80 [ 706.918178] ret_from_fork+0x35/0x40 [ 706.918898] Mem-Info: [ 706.919416] active_anon:4 inactive_anon:14 isolated_anon:0 [ 706.919416] active_file:88761 inactive_file:339626 isolated_file:352 [ 706.919416] unevictable:20 dirty:2961 writeback:64785 unstable:0 [ 706.919416] slab_reclaimable:3417 slab_unreclaimable:16684 [ 706.919416] mapped:6882 shmem:0 pagetables:945 bounce:0 [ 706.919416] free:13003 free_pcp:330 free_cma:0 [ 706.925059] Node 0 active_anon:16kB inactive_anon:56kB active_file:355044kB inactive_file:1358504kB unevictable:80kB isolated(anon):0kB isolated(file):1280kB mapped:27528kB dirty:11844kB writeback:259140kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 706.929835] Node 0 DMA free:7660kB min:380kB low:472kB high:564kB active_anon:0kB inactive_anon:0kB active_file:1776kB inactive_file:6168kB unevictable:0kB writepending:1776kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:4kB slab_unreclaimable:248kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 706.935049] lowmem_reserve[]: 0 1822 1822 1822 1822 [ 706.936001] Node 0 DMA32 free:44352kB min:44672kB low:55840kB high:67008kB active_anon:0kB inactive_anon:0kB active_file:353268kB inactive_file:1352364kB unevictable:80kB writepending:269208kB present:2080744kB managed:1885860kB mlocked:80kB slab_reclaimable:13664kB slab_unreclaimable:66488kB kernel_stack:2160kB pagetables:3780kB bounce:0kB free_pcp:1320kB local_pcp:756kB free_cma:0kB [ 706.941791] lowmem_reserve[]: 0 0 0 0 0 [ 706.942572] Node 0 DMA: 3*4kB (UME) 2*8kB (UE) 1*16kB (E) 2*32kB (UE) 4*64kB (UME) 5*128kB (UME) 2*256kB (UE) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 7660kB [ 706.945182] Node 0 DMA32: 256*4kB (ME) 66*8kB (UME) 261*16kB (UME) 451*32kB (UME) 238*64kB (UM) 70*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44352kB [ 706.947715] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 706.949283] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 706.950803] 76940 total pagecache pages [ 706.951566] 17 pages in swap cache [ 706.952253] Swap cache stats: add 12242, delete 12225, find 1904/2999 [ 706.953429] Free swap = 14298108kB [ 706.954132] Total swap = 14338044kB [ 706.954838] 524184 pages RAM [ 706.955444] 0 pages HighMem/MovableOnly [ 706.956199] 48742 pages reserved [ 706.956855] 0 pages hwpoisoned [ 707.797746] sssd_pam: page allocation stalls for 11044ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null) [ 707.799917] sssd_pam cpuset=/ mems_allowed=0 [ 707.800766] CPU: 1 PID: 548 Comm: sssd_pam Tainted: G OE 4.12.14-95.13-default #1 SLE12-SP4 [ 707.802495] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 707.803563] Call Trace: [ 707.804110] dump_stack+0x5a/0x75 [ 707.804778] warn_alloc+0xf0/0x190 [ 707.805476] __alloc_pages_slowpath+0x3a7/0xa0d [ 707.806354] ? __switch_to_asm+0x34/0x70 [ 707.807126] ? __switch_to_asm+0x40/0x70 [ 707.807894] ? __switch_to_asm+0x34/0x70 [ 707.808663] ? __switch_to_asm+0x34/0x70 [ 707.809438] ? __switch_to_asm+0x40/0x70 [ 707.810210] __alloc_pages_nodemask+0x1e9/0x210 [ 707.811088] alloc_pages_vma+0x92/0x200 [ 707.811865] __read_swap_cache_async+0x140/0x210 [ 707.812767] read_swap_cache_async+0x14/0x30 [ 707.813611] swapin_readahead+0x107/0x1f0 [ 707.814405] do_swap_page+0x2b8/0x8b0 [ 707.815149] __handle_mm_fault+0x783/0xef0 [ 707.815954] handle_mm_fault+0xc4/0x1d0 [ 707.816725] __do_page_fault+0x1f3/0x4c0 [ 707.817535] trace_do_page_fault+0x40/0x120 [ 707.818367] ? async_page_fault+0x2f/0x50 [ 707.819169] async_page_fault+0x45/0x50 [ 707.819937] RIP: 0000:0x7ffd2ad725d8 [ 707.820663] RSP: 2328:000055cbf1d95940 EFLAGS: 000f4237 [ 707.841147] dd: page allocation stalls for 11092ms, order:0, mode:0x142004a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE), nodemask=(null) [ 707.844535] dd cpuset=/ mems_allowed=0 [ 707.845295] CPU: 1 PID: 2087 Comm: dd Tainted: G OE 4.12.14-95.13-default #1 SLE12-SP4 [ 707.846957] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 707.848047] Call Trace: [ 707.848578] dump_stack+0x5a/0x75 [ 707.849257] warn_alloc+0xf0/0x190 [ 707.849949] __alloc_pages_slowpath+0x3a7/0xa0d [ 707.850826] __alloc_pages_nodemask+0x1e9/0x210 [ 707.851701] alloc_pages_current+0x70/0xf0 [ 707.852526] pagecache_get_page+0x4d/0x250 [ 707.853476] ll_write_begin+0xdf/0xd20 [lustre] [ 707.854373] generic_perform_write+0xb2/0x190 [ 707.855240] ? file_update_time+0x3a/0xf0 [ 707.856030] __generic_file_write_iter+0x184/0x1c0 [ 707.857019] ? lov_object_maxbytes+0x31/0x40 [lov] [ 707.857958] vvp_io_write_start+0x46a/0x760 [lustre] [ 707.859068] ? cl_lock_request+0x62/0x1d0 [obdclass] [ 707.860035] cl_io_start+0x58/0x110 [obdclass] [ 707.860941] cl_io_loop+0xb6/0x190 [obdclass] [ 707.861815] ll_file_io_generic+0x5c9/0xb50 [lustre] [ 707.862808] ll_file_write_iter+0xe5/0x410 [lustre] [ 707.863759] __vfs_write+0xdc/0x150 [ 707.864481] vfs_write+0xad/0x1a0 [ 707.865166] SyS_write+0x42/0x90 [ 707.865841] do_syscall_64+0x74/0x150 [ 707.866600] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 707.867567] RIP: 0033:0x7f2e8e44e3d0 [ 707.868291] RSP: 002b:00007ffee5ec7348 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 707.869673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2e8e44e3d0 [ 707.871022] RDX: 0000000000001000 RSI: 0000000000b39000 RDI: 0000000000000001 [ 707.872341] RBP: 0000000000001000 R08: 0000000000000003 R09: 00007f2e8e70d698 [ 707.873633] R10: 0000000010a8b550 R11: 0000000000000246 R12: 0000000000b39000 [ 707.874925] R13: 0000000000000000 R14: 0000000000b39000 R15: 00007ffee5ec7abb [ 707.909648] master: page allocation stalls for 11148ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null) [ 707.911669] master cpuset=/ mems_allowed=0 [ 707.912483] CPU: 1 PID: 1447 Comm: master Tainted: G OE 4.12.14-95.13-default #1 SLE12-SP4 [ 707.914201] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 707.915276] Call Trace: [ 707.915811] dump_stack+0x5a/0x75 [ 707.916488] warn_alloc+0xf0/0x190 [ 707.917185] __alloc_pages_slowpath+0x3a7/0xa0d …
Logs for another failure are at
https://testing.whamcloud.com/test_sets/2edf47d2-6267-11e9-8bb1-52540065bddc
Attachments
Issue Links
- is related to
-
LU-12864 sanity-benchmark test_iozone crashes with OOM on clients
- Open
-
LU-13212 Lustre client hangs machine under memory pressure
- Resolved
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...