Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10736

page allocation failure: order:8, mode:0x80d0

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • None
    • Lustre 2.10.1, Lustre 2.10.3
    • None
    • mofed4.1 and mofed.4.2.1
    • 2
    • 9223372036854775807

    Description

      This may be a duplicate of LU-10133 but we wanted to open a new case for our tracking. LU-10133 hasn't been updated in sometime.

      We are seeing these page allocation failure since moving to lustre2.10.x, centos7.x and mofed 4.x

       6174.345922] kworker/20:1: page allocation failure: order:8, mode:0x80d0
      [ 6174.345924] CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G           OE  ------------   3.10.0-693.17.1.el7.20180206.x86_64.lustre2103 #1
      [ 6174.345925] Hardware name: SGI.COM SUMMIT/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014
      [ 6174.345932] Workqueue: ib_cm cm_work_handler [ib_cm]
      [ 6174.345933] Call Trace:
      [ 6174.345934]  [<ffffffff81686d81>] dump_stack+0x19/0x1b
      [ 6174.345937]  [<ffffffff81186160>] warn_alloc_failed+0x110/0x180
      [ 6174.345940]  [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
      [ 6174.345942]  [<ffffffff811ce868>] alloc_pages_current+0x98/0x110
      [ 6174.345945]  [<ffffffff81184fae>] __get_free_pages+0xe/0x50
      [ 6174.345947]  [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
      [ 6174.345950]  [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
      [ 6174.345959]  [<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
      [ 6174.345967]  [<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
      [ 6174.345971]  [<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
      [ 6174.345976]  [<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
      [ 6174.345979]  [<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
      [ 6174.345982]  [<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
      [ 6174.345985]  [<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
      [ 6174.345989]  [<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
      [ 6174.345995]  [<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
      [ 6174.345997]  [<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
      [ 6174.346003]  [<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
      [ 6174.346014]  [<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]
      [ 6174.346021]  [<ffffffffa061f58c>] ? _cma_attach_to_dev+0x5c/0x70 [rdma_cm]
      [ 6174.346023]  [<ffffffffa0836a95>] kiblnd_cm_callback+0x755/0x2380 [ko2iblnd]
      [ 6174.346028]  [<ffffffffa0624ce6>] cma_req_handler+0x1c6/0x490 [rdma_cm]
      [ 6174.346030]  [<ffffffffa04f4327>] cm_process_work+0x27/0x120 [ib_cm]
      [ 6174.346033]  [<ffffffffa04f516b>] cm_req_handler+0xb0b/0xe30 [ib_cm]
      [ 6174.346035]  [<ffffffffa04f5e55>] cm_work_handler+0x395/0x1306 [ib_cm]
      [ 6174.346036]  [<ffffffff8168beff>] ? __schedule+0x41f/0x9a0
      [ 6174.346038]  [<ffffffff810a76ca>] process_one_work+0x17a/0x440
      [ 6174.346040]  [<ffffffff810a8396>] worker_thread+0x126/0x3c0
      [ 6174.346042]  [<ffffffff810a8270>] ? manage_workers.isra.24+0x2a0/0x2a0
      [ 6174.346043]  [<ffffffff810af83f>] kthread+0xcf/0xe0
      [ 6174.346044]  [<ffffffff810af770>] ? insert_kthread_work+0x40/0x40
      [ 6174.346045]  [<ffffffff81699718>] ret_from_fork+0x58/0x90
      [ 6174.346048]  [<ffffffff810af770>] ? insert_kthread_work+0x40/0x40
      [ 6174.346049] Mem-Info:
      [ 6174.346054] active_anon:11554 inactive_anon:6856 isolated_anon:0
      [ 6174.346054]  active_file:565346 inactive_file:12784204 isolated_file:0
      [ 6174.346054]  unevictable:4142 dirty:54 writeback:0 unstable:0
      [ 6174.346054]  slab_reclaimable:259526 slab_unreclaimable:357063
      [ 6174.346054]  mapped:8121 shmem:7283 pagetables:911 bounce:0
      [ 6174.346054]  free:63457 free_pcp:746 free_cma:0
      [ 6174.346055] Node 0 DMA free:15772kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      [ 6174.346058] lowmem_reserve[]: 0 2641 31809 31809
      [ 6174.346060] Node 0 DMA32 free:123636kB min:2704kB low:3380kB high:4056kB active_anon:2132kB inactive_anon:1916kB active_file:99500kB inactive_file:1867404kB unevictable:376kB isolated(anon):0kB isolated(file):0kB present:3049136kB managed:2707148kB mlocked:376kB dirty:4kB writeback:0kB mapped:3368kB shmem:2036kB slab_reclaimable:48796kB slab_unreclaimable:69468kB kernel_stack:192kB pagetables:248kB unstable:0kB bounce:0kB free_pcp:188kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      [ 6174.346063] lowmem_reserve[]: 0 0 29167 29167
      [ 6174.346064] Node 0 Normal free:54308kB min:29836kB low:37292kB high:44752kB active_anon:21816kB inactive_anon:19336kB active_file:971960kB inactive_file:23507144kB unevictable:7280kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29867144kB mlocked:7280kB dirty:100kB writeback:0kB mapped:28800kB shmem:19912kB slab_reclaimable:503968kB slab_unreclaimable:637584kB kernel_stack:7136kB pagetables:1428kB unstable:0kB bounce:0kB free_pcp:1536kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      [ 6174.346067] lowmem_reserve[]: 0 0 0 0
      [ 6174.346069] Node 1 Normal free:60112kB min:32976kB low:41220kB high:49464kB active_anon:22268kB inactive_anon:6172kB active_file:1189924kB inactive_file:25762476kB unevictable:8912kB isolated(anon):0kB isolated(file):0kB present:33554432kB managed:33012624kB mlocked:8912kB dirty:112kB writeback:0kB mapped:316kB shmem:7184kB slab_reclaimable:485340kB slab_unreclaimable:721200kB kernel_stack:12208kB pagetables:1968kB unstable:0kB bounce:0kB free_pcp:1216kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      [ 6174.346071] lowmem_reserve[]: 0 0 0 0
      [ 6174.346073] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15772kB
      [ 6174.346078] Node 0 DMA32: 2456*4kB (UEM) 1686*8kB (UEM) 542*16kB (UEM) 82*32kB (UEM) 156*64kB (UEM) 227*128kB (UEM) 106*256kB (UE) 43*512kB (UE) 1*1024kB (U) 0*2048kB 0*4096kB = 123824kB
      [ 6174.346084] Node 0 Normal: 262*4kB (EM) 876*8kB (UEM) 952*16kB (UEM) 296*32kB (UEM) 154*64kB (UM) 95*128kB (UM) 2*256kB (M) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 55800kB
      [ 6174.346089] Node 1 Normal: 219*4kB (EM) 2848*8kB (UEM) 366*16kB (UEM) 46*32kB (UM) 474*64kB (UM) 2*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 61580kB
      [ 6174.346100] 13359368 total pagecache pages
      [ 6174.346100] 0 pages in swap cache
      [ 6174.346101] Swap cache stats: add 0, delete 0, find 0/0
      [ 6174.346101] Free swap  = 0kB
      [ 6174.346101] Total swap = 0kB
      [ 6174.346102] 16757063 pages RAM
      [ 6174.346102] 0 pages HighMem/MovableOnly
      [ 6174.346102] 356383 pages reserved
      
      
      
      

       

       

       

      Attachments

        1. 0001-IB-mlx5-Implement-fragmented-completion-queue-CQ.patch
          20 kB
          Alexey Lyashkov

        Issue Links

          Activity

            [LU-10736] page allocation failure: order:8, mode:0x80d0

            Does anyone maintain a collection of these IB memory fragmentation hotfixes (LU-10133 is another candidate) against the latest RHEL7 kernels for those who want to use Red Hat's IB stack?

            The patches attached to this LU do no longer apply cleanly against the latest RHEL 7.6 kernel 3.10.0-957.27.2.el7.x86_64. (Unfortunately, Red Hat still did not merge any of these fixes yet. Having such a collection may convince Red Hat to merge such fixes faster.)

            Also, I wonder why these IB memory fragmentation hotfixes are not also part of the official Lustre server kernel rpms? Doesn't this imply you assume the usage of mofed? Shouldn't then the required mofed release be mentioned as a requirement in lustre/ChangeLog and/or the Lustre Support Matrix?

            IMHO it's very easy to miss important IB fixes at the moment.

            knweiss Karsten Weiss added a comment - Does anyone maintain a collection of these IB memory fragmentation hotfixes ( LU-10133 is another candidate) against the latest RHEL7 kernels for those who want to use Red Hat's IB stack? The patches attached to this LU do no longer apply cleanly against the latest RHEL 7.6 kernel 3.10.0-957.27.2.el7.x86_64. (Unfortunately, Red Hat still did not merge any of these fixes yet. Having such a collection may convince Red Hat to merge such fixes faster.) Also, I wonder why these IB memory fragmentation hotfixes are not also part of the official Lustre server kernel rpms? Doesn't this imply you assume the usage of mofed? Shouldn't then the required mofed release be mentioned as a requirement in lustre/ChangeLog and/or the Lustre Support Matrix? IMHO it's very easy to miss important IB fixes at the moment.
            pjones Peter Jones added a comment -

            mhanafi what was the Mellanox reference for this issue?

            pjones Peter Jones added a comment - mhanafi what was the Mellanox reference for this issue?
            pjones Peter Jones added a comment -

            ok - thanks Mahmoud

            pjones Peter Jones added a comment - ok - thanks Mahmoud

            All fixes landed in mofed4.4.2 please close.

            mhanafi Mahmoud Hanafi added a comment - All fixes landed in mofed4.4.2 please close.

            mlx5 fixes pointed in linux-rdma list

            shadow Alexey Lyashkov added a comment - mlx5 fixes pointed in linux-rdma list
            shadow Alexey Lyashkov added a comment - - edited

            Mahmoud - thanks for info. They have no response on my ticket.
            I need to have fixes in mofed 4.3, so will try to fix by self.
            It looks we can able to use mlx5_frag_bug_alloc and function similar as removed by pointed patch.

             static inline void mlx4_buf_offset(struct mlx4_buf buf, int offset)
             {
            -	if (BITS_PER_LONG == 64 || buf->nbufs == 1)
            -		return buf->direct.buf + offset;
            -	else
            -		return buf->page_list[offset >> PAGE_SHIFT].buf +
            -			(offset & (PAGE_SIZE - 1));
            +	return buf->direct.buf + offset;
             }
            

            it's introduce a some memory overhead but looks minimal solution.

            shadow Alexey Lyashkov added a comment - - edited Mahmoud - thanks for info. They have no response on my ticket. I need to have fixes in mofed 4.3, so will try to fix by self. It looks we can able to use mlx5_frag_bug_alloc and function similar as removed by pointed patch. static inline void mlx4_buf_offset(struct mlx4_buf buf, int offset) { - if (BITS_PER_LONG == 64 || buf->nbufs == 1) - return buf->direct.buf + offset; - else - return buf->page_list[offset >> PAGE_SHIFT].buf + - (offset & (PAGE_SIZE - 1)); + return buf->direct.buf + offset; } it's introduce a some memory overhead but looks minimal solution.

            Mellanox is working a patch for mlx5. Should be avail on mofed4.4+

            mhanafi Mahmoud Hanafi added a comment - Mellanox is working a patch for mlx5. Should be avail on mofed4.4+

            Same/similar bug hit for mlx5 card.

            shadow Alexey Lyashkov added a comment - Same/similar bug hit for mlx5 card.

            Thank you for the feedback Mahmoud. I will share with the site and ask for some tests.

             

            bruno.travouillon Bruno Travouillon (Inactive) added a comment - Thank you for the feedback Mahmoud. I will share with the site and ask for some tests.  

            I have attached a patch we received from Mellanox that resolved our allocation failures in mlx4. They will be pushing the patch upstream.
            We are following up with mlnx about a patch to the mlx5 driver as well.

            Our tests showed no performance impact but we would like for other to test it as well.

            mhanafi Mahmoud Hanafi added a comment - I have attached a patch we received from Mellanox that resolved our allocation failures in mlx4. They will be pushing the patch upstream. We are following up with mlnx about a patch to the mlx5 driver as well. Our tests showed no performance impact but we would like for other to test it as well.

            People

              ashehata Amir Shehata (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: