Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12241

recovery-random-scale test fail_client_mds fails with ‘ptlrpcd_00_00: page allocation stalls’

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.1
    • 3
    • 9223372036854775807

    Description

      recovery-random-scale test_fail_client_mds fails with 'test_fail_client_mds returned 4'. So far, we are only seeing this for SLES12 SP4 failover testing.

      Looking the suite_log for a recent failure with logs at https://testing.whamcloud.com/test_sets/b92c01d0-6692-11e9-8bb1-52540065bddc , we see there is a problem with one of the clients

      Client load failed on node trevis-35vm3, rc=1
      2019-04-23 18:40:10 Terminating clients loads ...
      Duration:               86400
      Server failover period: 1200 seconds
      Exited after:           2703 seconds
      Number of failovers before exit:
      mds1 failed over 3 times
      Status: FAIL: rc=4
       

      Looking at the client (vm3) console log, we see

      [   98.357170] Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failed client reintegrated -- failure NOT OK
      [   98.717424] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Number of failovers:
      [   98.717424] mds1 failed over 3 times                and counting...
      [   98.924613] Lustre: DEBUG MARKER: Number of failovers:
      [  704.539916] ptlrpcd_00_00: page allocation stalls for 10832ms, order:2, mode:0x1604040(GFP_NOFS|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
      [  704.542403] ptlrpcd_00_00 cpuset=/ mems_allowed=0
      [  704.543320] CPU: 0 PID: 1542 Comm: ptlrpcd_00_00 Tainted: G           OE      4.12.14-95.13-default #1 SLE12-SP4
      [  704.545126] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  704.546204] Call Trace:
      [  704.546821]  dump_stack+0x5a/0x75
      [  704.547537]  warn_alloc+0xf0/0x190
      [  704.548281]  __alloc_pages_slowpath+0x3a7/0xa0d
      [  704.549166]  __alloc_pages_nodemask+0x1e9/0x210
      [  704.550050]  cache_grow_begin+0x85/0x560
      [  704.550848]  fallback_alloc+0x167/0x1f0
      [  704.551905]  ? ptlrpc_new_bulk+0x442/0x850 [ptlrpc]
      [  704.552848]  __kmalloc+0x1b4/0x210
      [  704.553588]  ptlrpc_new_bulk+0x442/0x850 [ptlrpc]
      [  704.554536]  ptlrpc_prep_bulk_imp+0x58/0x190 [ptlrpc]
      [  704.555599]  osc_brw_prep_request+0x2de/0x1360 [osc]
      [  704.556757]  ? class_handle2object+0x90/0x170 [obdclass]
      [  704.557776]  ? __ldlm_handle2lock+0x36/0x3a0 [ptlrpc]
      [  704.558748]  ? osc_req_attr_set+0x10b/0x5b0 [osc]
      [  704.559662]  osc_build_rpc+0x4e7/0x1030 [osc]
      [  704.560552]  osc_send_write_rpc+0x3f2/0x9a0 [osc]
      [  704.561488]  ? __wake_up_common_lock+0x77/0x90
      [  704.562481]  ? vvp_page_completion_write+0xff/0x350 [lustre]
      [  704.563617]  ? cfs_hash_bd_from_key+0x31/0xa0 [libcfs]
      [  704.564597]  osc_io_unplug0+0x381/0x12b0 [osc]
      [  704.565466]  ? osc_extent_finish+0x657/0xba0 [osc]
      [  704.566400]  ? osc_brw_fini_request+0x72f/0x1790 [osc]
      [  704.567403]  ? at_measured+0x18f/0x310 [ptlrpc]
      [  704.568303]  ? __switch_to_asm+0x40/0x70
      [  704.569081]  brw_interpret+0x2e5/0xe20 [osc]
      [  704.569936]  ? finish_task_switch+0x76/0x260
      [  704.570788]  ptlrpc_check_set.part.25+0x459/0x1da0 [ptlrpc]
      [  704.571860]  ptlrpcd_check+0x3c1/0x560 [ptlrpc]
      [  704.572760]  ? call_timer_fn+0x140/0x140
      [  704.573558]  ptlrpcd+0x23d/0x4a0 [ptlrpc]
      [  704.574360]  ? wake_up_q+0x70/0x70
      [  704.575066]  kthread+0xff/0x140
      [  704.575748]  ? ptlrpcd_check+0x560/0x560 [ptlrpc]
      [  704.576672]  ? __kthread_parkme+0x70/0x70
      [  704.577461]  ret_from_fork+0x35/0x40
      [  704.578228] Mem-Info:
      [  704.578761] active_anon:0 inactive_anon:32 isolated_anon:0
      [  704.578761]  active_file:88761 inactive_file:339637 isolated_file:320
      [  704.578761]  unevictable:20 dirty:2961 writeback:64814 unstable:0
      [  704.578761]  slab_reclaimable:3423 slab_unreclaimable:16689
      [  704.578761]  mapped:6882 shmem:0 pagetables:945 bounce:0
      [  704.578761]  free:12999 free_pcp:302 free_cma:0
      [  704.584496] Node 0 active_anon:0kB inactive_anon:128kB active_file:355044kB inactive_file:1358632kB unevictable:80kB isolated(anon):0kB isolated(file):1152kB mapped:27528kB dirty:11844kB writeback:259256kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
      [  704.589280] Node 0 DMA free:7660kB min:380kB low:472kB high:564kB active_anon:0kB inactive_anon:0kB active_file:1776kB inactive_file:6164kB unevictable:0kB writepending:1776kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:4kB slab_unreclaimable:248kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      [  704.594509] lowmem_reserve[]: 0 1822 1822 1822 1822
      [  704.595458] Node 0 DMA32 free:44336kB min:44672kB low:55840kB high:67008kB active_anon:180kB inactive_anon:160kB active_file:353268kB inactive_file:1352868kB unevictable:80kB writepending:269200kB present:2080744kB managed:1885860kB mlocked:80kB slab_reclaimable:13688kB slab_unreclaimable:66508kB kernel_stack:2160kB pagetables:3780kB bounce:0kB free_pcp:1208kB local_pcp:756kB free_cma:0kB
      [  704.601265] lowmem_reserve[]: 0 0 0 0 0
      [  704.602025] Node 0 DMA: 3*4kB (UME) 2*8kB (UE) 1*16kB (E) 2*32kB (UE) 4*64kB (UME) 5*128kB (UME) 2*256kB (UE) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 7660kB
      [  704.604659] Node 0 DMA32: 256*4kB (ME) 64*8kB (UME) 261*16kB (UME) 452*32kB (UME) 238*64kB (UM) 70*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44368kB
      [  704.607241] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [  704.608828] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [  704.610360] 76961 total pagecache pages
      [  704.611146] 38 pages in swap cache
      [  704.611847] Swap cache stats: add 12242, delete 12204, find 1904/2998
      [  704.613042] Free swap  = 14298108kB
      [  704.613748] Total swap = 14338044kB
      [  704.614463] 524184 pages RAM
      [  704.615061] 0 pages HighMem/MovableOnly
      [  704.615852] 48742 pages reserved
      [  704.616515] 0 pages hwpoisoned
      [  706.881305] kthreadd: page allocation stalls for 10420ms, order:2, mode:0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=(null)
      [  706.891587] kthreadd cpuset=/ mems_allowed=0
      [  706.892439] CPU: 0 PID: 2 Comm: kthreadd Tainted: G           OE      4.12.14-95.13-default #1 SLE12-SP4
      [  706.894135] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  706.895203] Call Trace:
      [  706.895737]  dump_stack+0x5a/0x75
      [  706.896415]  warn_alloc+0xf0/0x190
      [  706.897122]  __alloc_pages_slowpath+0x3a7/0xa0d
      [  706.897989]  __alloc_pages_nodemask+0x1e9/0x210
      [  706.898862]  cache_grow_begin+0x85/0x560
      [  706.899632]  fallback_alloc+0x167/0x1f0
      [  706.900386]  kmem_cache_alloc_node+0x84/0x1f0
      [  706.901229]  ? __switch_to_asm+0x40/0x70
      [  706.902006]  copy_process.part.36+0xef/0x1d00
      [  706.902864]  ? __switch_to_asm+0x40/0x70
      [  706.903642]  ? __switch_to_asm+0x34/0x70
      [  706.904414]  ? __switch_to_asm+0x40/0x70
      [  706.905182]  ? __switch_to_asm+0x34/0x70
      [  706.905940]  ? __switch_to_asm+0x40/0x70
      [  706.906712]  ? __switch_to_asm+0x34/0x70
      [  706.907483]  ? __switch_to_asm+0x40/0x70
      [  706.908284]  ? __switch_to_asm+0x34/0x70
      [  706.909055]  ? __switch_to_asm+0x40/0x70
      [  706.909833]  ? __switch_to_asm+0x34/0x70
      [  706.910604]  ? __switch_to_asm+0x40/0x70
      [  706.911375]  ? __switch_to_asm+0x34/0x70
      [  706.912141]  ? __switch_to_asm+0x40/0x70
      [  706.912910]  ? __switch_to_asm+0x40/0x70
      [  706.913692]  ? __kthread_parkme+0x70/0x70
      [  706.914485]  ? __switch_to+0x7c/0x4a0
      [  706.915215]  _do_fork+0xdd/0x360
      [  706.915872]  kernel_thread+0x25/0x30
      [  706.916621]  kthreadd+0x2d4/0x320
      [  706.917301]  ? kthread_create_on_cpu+0x80/0x80
      [  706.918178]  ret_from_fork+0x35/0x40
      [  706.918898] Mem-Info:
      [  706.919416] active_anon:4 inactive_anon:14 isolated_anon:0
      [  706.919416]  active_file:88761 inactive_file:339626 isolated_file:352
      [  706.919416]  unevictable:20 dirty:2961 writeback:64785 unstable:0
      [  706.919416]  slab_reclaimable:3417 slab_unreclaimable:16684
      [  706.919416]  mapped:6882 shmem:0 pagetables:945 bounce:0
      [  706.919416]  free:13003 free_pcp:330 free_cma:0
      [  706.925059] Node 0 active_anon:16kB inactive_anon:56kB active_file:355044kB inactive_file:1358504kB unevictable:80kB isolated(anon):0kB isolated(file):1280kB mapped:27528kB dirty:11844kB writeback:259140kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
      [  706.929835] Node 0 DMA free:7660kB min:380kB low:472kB high:564kB active_anon:0kB inactive_anon:0kB active_file:1776kB inactive_file:6168kB unevictable:0kB writepending:1776kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:4kB slab_unreclaimable:248kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
      [  706.935049] lowmem_reserve[]: 0 1822 1822 1822 1822
      [  706.936001] Node 0 DMA32 free:44352kB min:44672kB low:55840kB high:67008kB active_anon:0kB inactive_anon:0kB active_file:353268kB inactive_file:1352364kB unevictable:80kB writepending:269208kB present:2080744kB managed:1885860kB mlocked:80kB slab_reclaimable:13664kB slab_unreclaimable:66488kB kernel_stack:2160kB pagetables:3780kB bounce:0kB free_pcp:1320kB local_pcp:756kB free_cma:0kB
      [  706.941791] lowmem_reserve[]: 0 0 0 0 0
      [  706.942572] Node 0 DMA: 3*4kB (UME) 2*8kB (UE) 1*16kB (E) 2*32kB (UE) 4*64kB (UME) 5*128kB (UME) 2*256kB (UE) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ME) 0*4096kB = 7660kB
      [  706.945182] Node 0 DMA32: 256*4kB (ME) 66*8kB (UME) 261*16kB (UME) 451*32kB (UME) 238*64kB (UM) 70*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44352kB
      [  706.947715] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [  706.949283] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [  706.950803] 76940 total pagecache pages
      [  706.951566] 17 pages in swap cache
      [  706.952253] Swap cache stats: add 12242, delete 12225, find 1904/2999
      [  706.953429] Free swap  = 14298108kB
      [  706.954132] Total swap = 14338044kB
      [  706.954838] 524184 pages RAM
      [  706.955444] 0 pages HighMem/MovableOnly
      [  706.956199] 48742 pages reserved
      [  706.956855] 0 pages hwpoisoned
      [  707.797746] sssd_pam: page allocation stalls for 11044ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)
      [  707.799917] sssd_pam cpuset=/ mems_allowed=0
      [  707.800766] CPU: 1 PID: 548 Comm: sssd_pam Tainted: G           OE      4.12.14-95.13-default #1 SLE12-SP4
      [  707.802495] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  707.803563] Call Trace:
      [  707.804110]  dump_stack+0x5a/0x75
      [  707.804778]  warn_alloc+0xf0/0x190
      [  707.805476]  __alloc_pages_slowpath+0x3a7/0xa0d
      [  707.806354]  ? __switch_to_asm+0x34/0x70
      [  707.807126]  ? __switch_to_asm+0x40/0x70
      [  707.807894]  ? __switch_to_asm+0x34/0x70
      [  707.808663]  ? __switch_to_asm+0x34/0x70
      [  707.809438]  ? __switch_to_asm+0x40/0x70
      [  707.810210]  __alloc_pages_nodemask+0x1e9/0x210
      [  707.811088]  alloc_pages_vma+0x92/0x200
      [  707.811865]  __read_swap_cache_async+0x140/0x210
      [  707.812767]  read_swap_cache_async+0x14/0x30
      [  707.813611]  swapin_readahead+0x107/0x1f0
      [  707.814405]  do_swap_page+0x2b8/0x8b0
      [  707.815149]  __handle_mm_fault+0x783/0xef0
      [  707.815954]  handle_mm_fault+0xc4/0x1d0
      [  707.816725]  __do_page_fault+0x1f3/0x4c0
      [  707.817535]  trace_do_page_fault+0x40/0x120
      [  707.818367]  ? async_page_fault+0x2f/0x50
      [  707.819169]  async_page_fault+0x45/0x50
      [  707.819937] RIP: 0000:0x7ffd2ad725d8
      [  707.820663] RSP: 2328:000055cbf1d95940 EFLAGS: 000f4237
      [  707.841147] dd: page allocation stalls for 11092ms, order:0, mode:0x142004a(GFP_NOFS|__GFP_HIGHMEM|__GFP_HARDWALL|__GFP_MOVABLE), nodemask=(null)
      [  707.844535] dd cpuset=/ mems_allowed=0
      [  707.845295] CPU: 1 PID: 2087 Comm: dd Tainted: G           OE      4.12.14-95.13-default #1 SLE12-SP4
      [  707.846957] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  707.848047] Call Trace:
      [  707.848578]  dump_stack+0x5a/0x75
      [  707.849257]  warn_alloc+0xf0/0x190
      [  707.849949]  __alloc_pages_slowpath+0x3a7/0xa0d
      [  707.850826]  __alloc_pages_nodemask+0x1e9/0x210
      [  707.851701]  alloc_pages_current+0x70/0xf0
      [  707.852526]  pagecache_get_page+0x4d/0x250
      [  707.853476]  ll_write_begin+0xdf/0xd20 [lustre]
      [  707.854373]  generic_perform_write+0xb2/0x190
      [  707.855240]  ? file_update_time+0x3a/0xf0
      [  707.856030]  __generic_file_write_iter+0x184/0x1c0
      [  707.857019]  ? lov_object_maxbytes+0x31/0x40 [lov]
      [  707.857958]  vvp_io_write_start+0x46a/0x760 [lustre]
      [  707.859068]  ? cl_lock_request+0x62/0x1d0 [obdclass]
      [  707.860035]  cl_io_start+0x58/0x110 [obdclass]
      [  707.860941]  cl_io_loop+0xb6/0x190 [obdclass]
      [  707.861815]  ll_file_io_generic+0x5c9/0xb50 [lustre]
      [  707.862808]  ll_file_write_iter+0xe5/0x410 [lustre]
      [  707.863759]  __vfs_write+0xdc/0x150
      [  707.864481]  vfs_write+0xad/0x1a0
      [  707.865166]  SyS_write+0x42/0x90
      [  707.865841]  do_syscall_64+0x74/0x150
      [  707.866600]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      [  707.867567] RIP: 0033:0x7f2e8e44e3d0
      [  707.868291] RSP: 002b:00007ffee5ec7348 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  707.869673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2e8e44e3d0
      [  707.871022] RDX: 0000000000001000 RSI: 0000000000b39000 RDI: 0000000000000001
      [  707.872341] RBP: 0000000000001000 R08: 0000000000000003 R09: 00007f2e8e70d698
      [  707.873633] R10: 0000000010a8b550 R11: 0000000000000246 R12: 0000000000b39000
      [  707.874925] R13: 0000000000000000 R14: 0000000000b39000 R15: 00007ffee5ec7abb
      [  707.909648] master: page allocation stalls for 11148ms, order:0, mode:0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null)
      [  707.911669] master cpuset=/ mems_allowed=0
      [  707.912483] CPU: 1 PID: 1447 Comm: master Tainted: G           OE      4.12.14-95.13-default #1 SLE12-SP4
      [  707.914201] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [  707.915276] Call Trace:
      [  707.915811]  dump_stack+0x5a/0x75
      [  707.916488]  warn_alloc+0xf0/0x190
      [  707.917185]  __alloc_pages_slowpath+0x3a7/0xa0d
      …
      

      Logs for another failure are at
      https://testing.whamcloud.com/test_sets/2edf47d2-6267-11e9-8bb1-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: