Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14815

memory issues leading to blocked new connections until drop_cache set

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • None
    • None
    • None
    • CentOS 7.8.2003
      Kernel -- kernel-3.10.0-1062.9.1.el7_lustre.x86_64
      Lustre -- lustre-2.12.4-1.el7.x86_64
      e2fsprogs -- e2fsprogs-1.45.6.wc5-0.el7.x86_64
      IML -- yes
    • 3
    • 9223372036854775807

    Description

      We have a system with 2 OSS and 2 MDS running community lustre editions with patched kernels. The specs of both OSS nodes are:

      • 2 socket 20 core Xeon Gold 6230 @ 2.1GHz
      • 384GB of RAM
      • single port EDR IB

      Under load on the system, we average the following load usage:

      [root@oss2 ~]# uptime
      {{ 15:54:37 up 7:33, 2 users, load average: 185.68, 161.83, 164.17}}
      [root@oss2 ~]# free
      {{ total used free shared buff/cache available}}
      Mem: 394501208 11600480 21953712 968424 360947016 380784736
      Swap: 4194300 8968 4185332
      [root@oss2 ~]#

      When the system is being utilized we see quite often the following (once the messages start, they continue until either a reboot or an "echo 3 > /proc/sys/vm_drop_caches"):

      [Mon Jul 5 15:40:39 2021] kworker/13:0: page allocation failure: order:8, mode:0x80d0
      [Mon Jul 5 15:40:39 2021] CPU: 13 PID: 115770 Comm: kworker/13:0 Kdump: loaded Tainted: P OE ------------ 3.10.0-1062.9.1.el7_lustre.x86_64 #1
      [Mon Jul 5 15:40:39 2021] Hardware name: Dell Inc. PowerEdge R740/06WXJT, BIOS 2.10.0 11/12/2020
      [Mon Jul 5 15:40:39 2021] Workqueue: ib_cm cm_work_handler [ib_cm]
      [Mon Jul 5 15:40:39 2021] Call Trace:
      [Mon Jul 5 15:40:39 2021] [<ffffffffabf7ac23>] dump_stack+0x19/0x1b
      [Mon Jul 5 15:40:39 2021] [<ffffffffab9c3d70>] warn_alloc_failed+0x110/0x180
      [Mon Jul 5 15:40:39 2021] [<ffffffffab9c6af0>] ? drain_pages+0xb0/0xb0
      [Mon Jul 5 15:40:39 2021] [<ffffffffab9c897f>] __alloc_pages_nodemask+0x9df/0xbe0
      [Mon Jul 5 15:40:39 2021] [<ffffffffaba16b28>] alloc_pages_current+0x98/0x110
      [Mon Jul 5 15:40:39 2021] [<ffffffffab9c2b1e>] __get_free_pages+0xe/0x40
      [Mon Jul 5 15:40:39 2021] [<ffffffffabbadafc>] swiotlb_alloc_coherent+0x5c/0x160
      [Mon Jul 5 15:40:39 2021] [<ffffffffab86ead1>] x86_swiotlb_alloc_coherent+0x41/0x50
      [Mon Jul 5 15:40:39 2021] [<ffffffffc06d8394>] mlx5_dma_zalloc_coherent_node+0xb4/0x110 [mlx5_core]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc06d8c89>] mlx5_buf_alloc_node+0x89/0x120 [mlx5_core]
      [Mon Jul 5 15:40:39 2021] [<ffffffffaba674e1>] ? alloc_inode+0x51/0xa0
      [Mon Jul 5 15:40:39 2021] [<ffffffffc06d8d34>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d5e63f>] create_kernel_qp.isra.65+0x43a/0x741 [mlx5_ib]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d48d1c>] create_qp_common+0x8ec/0x17a0 [mlx5_ib]
      [Mon Jul 5 15:40:39 2021] [<ffffffffaba25286>] ? kmem_cache_alloc_trace+0x1d6/0x200
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d49d1a>] mlx5_ib_create_qp+0x14a/0x820 [mlx5_ib]
      [Mon Jul 5 15:40:39 2021] [<ffffffffab9ddc2d>] ? kvmalloc_node+0x8d/0xe0
      [Mon Jul 5 15:40:39 2021] [<ffffffffab9ddc2d>] ? kvmalloc_node+0x8d/0xe0
      [Mon Jul 5 15:40:39 2021] [<ffffffffab9ddcb5>] ? kvfree+0x35/0x40
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d411e6>] ? mlx5_ib_create_cq+0x346/0x6f0 [mlx5_ib]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0cbccab>] ib_create_qp+0x8b/0x320 [ib_core]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d98974>] rdma_create_qp+0x34/0xb0 [rdma_cm]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0a0b85c>] kiblnd_create_conn+0xe5c/0x19b0 [ko2iblnd]
      [Mon Jul 5 15:40:39 2021] [<ffffffffaba2630d>] ? kmem_cache_alloc_node_trace+0x11d/0x210
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0a1a24c>] kiblnd_passive_connect+0xa2c/0x1760 [ko2iblnd]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0a1b6d5>] kiblnd_cm_callback+0x755/0x23a0 [ko2iblnd]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d978af>] ? _cma_attach_to_dev+0x5f/0x70 [rdma_cm]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d9cca0>] cma_ib_req_handler+0xce0/0x12a0 [rdma_cm]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d87beb>] cm_process_work+0x2b/0x130 [ib_cm]
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d89468>] cm_req_handler+0xaa8/0xf80 [ib_cm]
      [Mon Jul 5 15:40:39 2021] [<ffffffffab82b59e>] ? __switch_to+0xce/0x580
      [Mon Jul 5 15:40:39 2021] [<ffffffffc0d89d1d>] cm_work_handler+0x15d/0xfcf [ib_cm]
      [Mon Jul 5 15:40:39 2021] [<ffffffffabf805a2>] ? __schedule+0x402/0x840
      [Mon Jul 5 15:40:39 2021] [<ffffffffab8be21f>] process_one_work+0x17f/0x440
      [Mon Jul 5 15:40:39 2021] [<ffffffffab8bf336>] worker_thread+0x126/0x3c0
      [Mon Jul 5 15:40:39 2021] [<ffffffffab8bf210>] ? manage_workers.isra.26+0x2a0/0x2a0
      [Mon Jul 5 15:40:39 2021] [<ffffffffab8c61f1>] kthread+0xd1/0xe0
      [Mon Jul 5 15:40:39 2021] [<ffffffffab8c6120>] ? insert_kthread_work+0x40/0x40
      [Mon Jul 5 15:40:39 2021] [<ffffffffabf8dd37>] ret_from_fork_nospec_begin+0x21/0x21
      [Mon Jul 5 15:40:39 2021] [<ffffffffab8c6120>] ? insert_kthread_work+0x40/0x40
      [Mon Jul 5 15:40:39 2021] Mem-Info:
      [Mon Jul 5 15:40:39 2021] active_anon:133507 inactive_anon:148978 isolated_anon:0
      {{ active_file:10646819 inactive_file:81974614 isolated_file:0}}
      {{ unevictable:19088 dirty:1079 writeback:0 unstable:0}}
      {{ slab_reclaimable:1867029 slab_unreclaimable:195034}}
      {{ mapped:30616 shmem:229818 pagetables:2598 bounce:0}}
      {{ free:1061087 free_pcp:148 free_cma:0}}
      [Mon Jul 5 15:40:39 2021] Node 0 DMA free:15896kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15896kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      [Mon Jul 5 15:40:39 2021] lowmem_reserve[]: 0 1281 191724 191724
      [Mon Jul 5 15:40:39 2021] Node 0 DMA32 free:762136kB min:296kB low:368kB high:444kB active_anon:16kB inactive_anon:152kB active_file:4548kB inactive_file:4916kB unevictable:216kB isolated(anon):0kB isolated(file):0kB present:1566348kB managed:1312364kB mlocked:216kB dirty:4kB writeback:0kB mapped:216kB shmem:80kB slab_reclaimable:213740kB slab_unreclaimable:38944kB kernel_stack:144kB pagetables:236kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      [Mon Jul 5 15:40:39 2021] lowmem_reserve[]: 0 0 190442 190442
      [Mon Jul 5 15:40:39 2021] Node 0 Normal free:2062308kB min:44544kB low:55680kB high:66816kB active_anon:111288kB inactive_anon:128592kB active_file:15617196kB inactive_file:169639488kB unevictable:8768kB isolated(anon):0kB isolated(file):0kB present:198180864kB managed:195015992kB mlocked:8768kB dirty:1668kB writeback:0kB mapped:39240kB shmem:126220kB slab_reclaimable:3365336kB slab_unreclaimable:340572kB kernel_stack:14336kB pagetables:4008kB unstable:0kB bounce:0kB free_pcp:420kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      [Mon Jul 5 15:40:39 2021] lowmem_reserve[]: 0 0 0 0
      [Mon Jul 5 15:40:39 2021] Node 1 Normal free:1404008kB min:45260kB low:56572kB high:67888kB active_anon:422868kB inactive_anon:467168kB active_file:26965532kB inactive_file:158254052kB unevictable:67368kB isolated(anon):0kB isolated(file):0kB present:201326592kB managed:198156940kB mlocked:67368kB dirty:2644kB writeback:0kB mapped:83008kB shmem:792972kB slab_reclaimable:3889040kB slab_unreclaimable:400620kB kernel_stack:20768kB pagetables:6148kB unstable:0kB bounce:0kB free_pcp:216kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      [Mon Jul 5 15:40:39 2021] lowmem_reserve[]: 0 0 0 0
      [Mon Jul 5 15:40:39 2021] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15896kB
      [Mon Jul 5 15:40:39 2021] Node 0 DMA32: 1512*4kB (UEM) 1277*8kB (UEM) 881*16kB (UEM) 767*32kB (UEM) 720*64kB (UEM) 457*128kB (UEM) 252*256kB (UEM) 155*512kB (UM) 22*1024kB (UEM) 51*2048kB (UE) 81*4096kB (M) = 762104kB
      [Mon Jul 5 15:40:39 2021] Node 0 Normal: 5268*4kB (UE) 4145*8kB (UE) 125630*16kB (UEM) 1*32kB (U) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 2064344kB
      [Mon Jul 5 15:40:39 2021] Node 1 Normal: 57139*4kB (UEM) 5107*8kB (UE) 70991*16kB (UM) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1405268kB
      [Mon Jul 5 15:40:39 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [Mon Jul 5 15:40:39 2021] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [Mon Jul 5 15:40:39 2021] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
      [Mon Jul 5 15:40:39 2021] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
      [Mon Jul 5 15:40:39 2021] 92854017 total pagecache pages
      [Mon Jul 5 15:40:39 2021] 86 pages in swap cache
      [Mon Jul 5 15:40:39 2021] Swap cache stats: add 520, delete 434, find 402/518
      [Mon Jul 5 15:40:39 2021] Free swap = 4185332kB
      [Mon Jul 5 15:40:39 2021] Total swap = 4194300kB
      [Mon Jul 5 15:40:39 2021] 100272446 pages RAM
      [Mon Jul 5 15:40:39 2021] 0 pages HighMem/MovableOnly
      [Mon Jul 5 15:40:39 2021] 1647148 pages reserved

      I have seen mention in other Jira entries (e.g. LU-10133) versions of the MLNX code perhaps being an issue, this is using the versions of OFED included prebuilt with the Lustre packages.

      When the above messages begin to repeat we see an issue where new mounts cannot succeed (they hang at mount). A rebooted compute node (where the filesystem was mounted, then rebooted) will not be able to mount after the reboot until either the OSS nodes are rebooted (going clearing memory and going through recovery) or "echo 3 > /proc/sys/vm/drop_cache" is run.

      Currently, all lustre module tunables are default, we had tried with a few different options in hoping to provide better performance but the same issues above occurred.

      Attachments

        Activity

          People

            pjones Peter Jones
            makia Makia Minich (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: