Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13162

parallel-scale test_statahead: mdsrate invoked oom-killer

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: Lustre 2.13.0
    • Fix Version/s: Lustre 2.14.0
    • Labels:
    • Environment:
      RHEL 8.1 client + RHEL 7.7 server
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/0925c456-3ba8-11ea-bb75-52540065bddc

      test_statahead failed with the following error:

      + su mpiuser sh -c "/usr/lib64/openmpi/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh --oversubscribe -machinefile /tmp/auster.machines -np 64 /usr/lib64/openmpi/bin/mdsrate --mknod --dir /mnt/lustre/dstatahead --nfiles 160711 --filefmt 'f%%d' "
      [1579521814.463727] [trevis-12vm6:7395 :0]            cpu.c:52   UCX  WARN  CPU does not support invariant TSC, time may be unstable
      [1579522133.063761] [trevis-12vm7:26045:0]            cpu.c:52   UCX  WARN  CPU does not support invariant TSC, time may be unstable
      

      Clients crashed:

      [69145.465824] Lustre: DEBUG MARKER: == parallel-scale test statahead: statahead test, multiple clients =================================== 12:03:31 (1579521811)
      [69145.622411] Lustre: lustre-OST0000-osc-ffff9d88618a4800: reconnect after 7127s idle
      [69166.377333] Lustre: lustre-OST0000-osc-ffff9d88618a4800: disconnect after 21s idle
      [69173.846404] mdsrate invoked oom-killer: gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null), order=0, oom_score_adj=0
      [69173.848578] mdsrate cpuset=/ mems_allowed=0
      [69173.849368] CPU: 1 PID: 7399 Comm: mdsrate Kdump: loaded Tainted: G           OE    --------- -  - 4.18.0-147.3.1.el8_1.x86_64 #1
      [69173.851539] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [69173.852562] Call Trace:
      [69173.853154]  dump_stack+0x5c/0x80
      [69173.853821]  dump_header+0x6e/0x27a
      [69173.854515]  ? notifier_call_chain+0x47/0x70
      [69173.855409]  out_of_memory.cold.32+0xa/0x80
      [69173.856169]  __alloc_pages_slowpath+0xc0f/0xce0
      [69173.856982]  __alloc_pages_nodemask+0x245/0x280
      [69173.857837]  alloc_pages_vma+0x74/0x1d0
      [69173.858574]  do_anonymous_page+0x90/0x370
      [69173.859325]  __handle_mm_fault+0x66e/0x6b0
      [69173.860069]  handle_mm_fault+0xda/0x200
      [69173.860764]  __get_user_pages+0x255/0x7c0
      [69173.861541]  ? _cond_resched+0x15/0x30
      [69173.862230]  get_user_pages+0x3e/0x50
      [69173.862898]  get_user_pages_longterm+0x34/0x190
      [69173.863772]  ib_umem_get+0x2ee/0x520 [ib_core]
      [69173.864602]  mlx4_ib_reg_user_mr+0x71/0x1e0 [mlx4_ib]
      [69173.865519]  ib_uverbs_reg_mr+0x143/0x240 [ib_uverbs]
      [69173.866428]  ? __blk_mq_run_hw_queue+0x51/0xd0
      [69173.867218]  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xb1/0xf0 [ib_uverbs]
      [69173.868475]  ib_uverbs_run_method+0x20c/0x7a0 [ib_uverbs]
      [69173.869438]  ? __switch_to_asm+0x35/0x70
      [69173.870146]  ? uverbs_disassociate_api+0x100/0x100 [ib_uverbs]
      [69173.871155]  ? __switch_to_asm+0x41/0x70
      [69173.871858]  ? __switch_to_asm+0x35/0x70
      [69173.872566]  ib_uverbs_cmd_verbs+0x189/0x380 [ib_uverbs]
      [69173.873490]  ? __switch_to_asm+0x41/0x70
      [69173.874196]  ? __switch_to_asm+0x35/0x70
      [69173.874898]  ? __switch_to_asm+0x41/0x70
      [69173.875600]  ? __switch_to_asm+0x35/0x70
      [69173.876315]  ? __switch_to+0x115/0x480
      [69173.876999]  ? finish_task_switch+0x76/0x2b0
      [69173.877765]  ? free_swap_slot+0x9a/0xf0
      [69173.878446]  ? wp_page_reuse+0x4d/0x60
      [69173.879128]  ? __raw_spin_unlock+0x5/0x10
      [69173.879844]  ? do_wp_page+0x217/0x310
      [69173.880501]  ? __handle_mm_fault+0x67e/0x6b0
      [69173.881267]  ib_uverbs_ioctl+0xa3/0x100 [ib_uverbs]
      [69173.882139]  do_vfs_ioctl+0xa4/0x630
      [69173.882806]  ? __x64_sys_madvise+0x4a6/0x790
      [69173.883573]  ? syscall_trace_enter+0x1d3/0x2c0
      [69173.884356]  ksys_ioctl+0x60/0x90
      [69173.884963]  __x64_sys_ioctl+0x16/0x20
      [69173.885640]  do_syscall_64+0x5b/0x1b0
      [69173.886311]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      parallel-scale test_statahead - trevis-12vm6, trevis-12vm7 crashed during parallel-scale test_statahead

        Attachments

          Activity

            People

            • Assignee:
              ys Yang Sheng
              Reporter:
              maloo Maloo
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: