[LU-10133] Multi-page allocation failures in mlx4/mlx5 Created: 17/Oct/17  Updated: 27/Apr/20  Resolved: 23/Nov/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Cliff White (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: soak
Environment:

Soak cluster - lustre-master build 3654 lustre version=2.10.54_13_g84f690e


Attachments: File soak-17-lustre.log.txt.gz    
Issue Links:
Duplicate
is duplicated by LU-10322 OSS - memory issues, page allocation ... Resolved
Related
is related to LU-10736 page allocation failure: order:8, mod... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

I am seeing multiple page allocation failures from soak-clients. Failures seem to be semi-random.
Example:

Oct 17 02:20:07 soak-17 kernel: kworker/u480:1: page allocation failure: order:8, mode:0x80d0
Oct 17 02:20:07 soak-17 kernel: CPU: 9 PID: 58714 Comm: kworker/u480:1 Tainted: G           OE  ------------   3.10.0-693.2.2.el7.x86_64 #1
Oct 17 02:20:07 soak-17 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Oct 17 02:20:08 soak-17 kernel: Workqueue: rdma_cm cma_work_handler [rdma_cm]
Oct 17 02:20:08 soak-17 kernel: 00000000000080d0 00000000a9e78c95 ffff8803ee9bf848 ffffffff816a3db1
Oct 17 02:20:08 soak-17 kernel: ffff8803ee9bf8d8 ffffffff81188810 0000000000000000 ffff88043ffdb000
Oct 17 02:20:08 soak-17 kernel: 0000000000000008 00000000000080d0 ffff8803ee9bf8d8 00000000a9e78c95
Oct 17 02:20:08 soak-17 kernel: Call Trace:
Oct 17 02:20:08 soak-17 kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b
Oct 17 02:20:08 soak-17 kernel: [<ffffffff81188810>] warn_alloc_failed+0x110/0x180
Oct 17 02:20:08 soak-17 kernel: [<ffffffff8169fd8a>] __alloc_pages_slowpath+0x6b6/0x724
Oct 17 02:20:08 soak-17 kernel: [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420
Oct 17 02:20:08 soak-17 kernel: [<ffffffff81030f8f>] dma_generic_alloc_coherent+0x8f/0x140
Oct 17 02:20:08 soak-17 kernel: [<ffffffff81064341>] x86_swiotlb_alloc_coherent+0x21/0x50
Oct 17 02:20:08 soak-17 kernel: [<ffffffffc02914d3>] mlx4_buf_direct_alloc.isra.6+0xd3/0x1a0 [mlx4_core]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc029176b>] mlx4_buf_alloc+0x1cb/0x240 [mlx4_core]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc02940d0>] ? __mlx4_cmd+0x560/0x920 [mlx4_core]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc061085e>] create_qp_common.isra.31+0x62e/0x10d0 [mlx4_ib]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc061144e>] mlx4_ib_create_qp+0x14e/0x480 [mlx4_ib]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc03c9c3a>] ib_create_qp+0x7a/0x2f0 [ib_core]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc04f66d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc0bd8539>] kiblnd_create_conn+0xbf9/0x1960 [ko2iblnd]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc0be8649>] kiblnd_cm_callback+0x1429/0x2300 [ko2iblnd]
Oct 17 02:20:09 soak-17 kernel: [<ffffffffc04fa57c>] cma_work_handler+0x6c/0xa0 [rdma_cm]
Oct 17 02:20:09 soak-17 kernel: [<ffffffff810a881a>] process_one_work+0x17a/0x440
Oct 17 02:20:09 soak-17 kernel: [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
Oct 17 02:20:09 soak-17 kernel: [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
Oct 17 02:20:09 soak-17 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
Oct 17 02:20:09 soak-17 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Oct 17 02:20:10 soak-17 kernel: [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
Oct 17 02:20:10 soak-17 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Oct 17 02:20:10 soak-17 kernel: Mem-Info:
Oct 17 02:20:10 soak-17 kernel: active_anon:36658 inactive_anon:27590 isolated_anon:6#012 active_file:2710466 inactive_file:345768 isolated_file:10#012 unevictable:0 dirty:14 writeback:0 unstable:0#012 slab_reclaimable:30971 slab_unreclaimable:3983583#012 mapped:10108 shmem:6384 pagetables:3086 bounce:0#012 free:776253 free_pcp:359 free_cma:0
Oct 17 02:20:11 soak-17 kernel: Node 0 DMA free:15784kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15932kB managed:15848kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Oct 17 02:20:11 soak-17 kernel: lowmem_reserve[]: 0 2580 15620 15620
Oct 17 02:20:11 soak-17 kernel: Node 0 DMA32 free:132736kB min:7320kB low:9148kB high:10980kB active_anon:6472kB inactive_anon:8768kB active_file:1063620kB inactive_file:27644kB unevictable:0kB isolated(anon):24kB isolated(file):40kB present:3051628kB managed:2643828kB mlocked:0kB dirty:8kB writeback:0kB mapped:2140kB shmem:116kB slab_reclaimable:9352kB slab_unreclaimable:1306892kB kernel_stack:1152kB pagetables:1196kB unstable:0kB bounce:0kB free_pcp:4kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 17 02:20:11 soak-17 kernel: lowmem_reserve[]: 0 0 13040 13040
Oct 17 02:20:11 soak-17 kernel: Node 0 Normal free:1149812kB min:37012kB low:46264kB high:55516kB active_anon:69848kB inactive_anon:32420kB active_file:4495364kB inactive_file:737992kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13353036kB mlocked:0kB dirty:24kB writeback:0kB mapped:9156kB shmem:248kB slab_reclaimable:54264kB slab_unreclaimable:6303688kB kernel_stack:7248kB pagetables:5096kB unstable:0kB bounce:0kB free_pcp:860kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 17 02:20:12 soak-17 kernel: lowmem_reserve[]: 0 0 0 0
Oct 17 02:20:12 soak-17 kernel: Node 1 Normal free:1805688kB min:45728kB low:57160kB high:68592kB active_anon:70700kB inactive_anon:69172kB active_file:5282880kB inactive_file:617436kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16777216kB managed:16498508kB mlocked:0kB dirty:24kB writeback:0kB mapped:29136kB shmem:25172kB slab_reclaimable:60268kB slab_unreclaimable:8323752kB kernel_stack:5568kB pagetables:6052kB unstable:0kB bounce:0kB free_pcp:1468kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 17 02:20:13 soak-17 kernel: lowmem_reserve[]: 0 0 0 0
Oct 17 02:20:13 soak-17 kernel: Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 0*64kB 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15784kB
Oct 17 02:20:13 soak-17 kernel: Node 0 DMA32: 2018*4kB (UEM) 1070*8kB (UEM) 670*16kB (UEM) 685*32kB (UEM) 594*64kB (UEM) 199*128kB (UEM) 80*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 133240kB
Oct 17 02:20:13 soak-17 kernel: Node 0 Normal: 8492*4kB (UEM) 5207*8kB (UEM) 3978*16kB (UEM) 8657*32kB (UEM) 8319*64kB (EM) 1594*128kB (M) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1152744kB
Oct 17 02:20:13 soak-17 kernel: Node 1 Normal: 14583*4kB (UEM) 8566*8kB (UEM) 5482*16kB (UEM) 13112*32kB (UEM) 11765*64kB (UEM) 2443*128kB (UM) 418*256kB (UM) 5*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 1809388kB
Oct 17 02:20:13 soak-17 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 17 02:20:13 soak-17 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 17 02:20:13 soak-17 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 17 02:20:14 soak-17 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 17 02:20:14 soak-17 kernel: 3062619 total pagecache pages
Oct 17 02:20:14 soak-17 kernel: 6 pages in swap cache
Oct 17 02:20:14 soak-17 kernel: Swap cache stats: add 13, delete 7, find 0/0
Oct 17 02:20:14 soak-17 kernel: Free swap  = 16319432kB
Oct 17 02:20:14 soak-17 kernel: Total swap = 16319484kB
Oct 17 02:20:14 soak-17 kernel: 8369066 pages RAM
Oct 17 02:20:14 soak-17 kernel: 0 pages HighMem/MovableOnly
Oct 17 02:20:14 soak-17 kernel: 241261 pages reserved
Oct 17 02:20:15 soak-17 kernel: kworker/u480:1: page allocation failure: order:8, mode:0x80d0
Oct 17 02:20:15 soak-17 kernel: CPU: 9 PID: 58714 Comm: kworker/u480:1 Tainted: G           OE  ------------   3.10.0-693.2.2.el7.x86_64 #1
Oct 17 02:20:15 soak-17 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Oct 17 02:20:15 soak-17 kernel: Workqueue: rdma_cm cma_work_handler [rdma_cm]

The systems appear to recover and continue. Lustre-log dump from soak-17 after the most recent failure attached.



 Comments   
Comment by Andreas Dilger [ 18/Oct/17 ]

It looks like there is a fix for this problem in the upstream kernel, to use kvmalloc_array() instead of kmalloc() for the qp->sq.wrid and qp->rq.wrid allocations in create_qp_common(). The main fix is:

commit e9105cdefbf64cd7aea300f934c92051e7cb7cff
Author:     Li Dongyang <dongyang.li@anu.edu.au>
AuthorDate: Wed Aug 16 23:31:23 2017 +1000
Commit:     Doug Ledford <dledford@redhat.com>
CommitDate: Tue Aug 22 16:48:35 2017 -0400

    IB/mlx4: use kvmalloc_array to allocate wrid
    
    We could use kvmalloc_array instead of the
    kmalloc and __vmalloc combination.
    After this we don't need to include linux/vmalloc.h
    
    Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
    Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
    Signed-off-by: Doug Ledford <dledford@redhat.com>

which itself depends on the kvmalloc_array() and kvmalloc() helper functions landed in the following (relatively large) patches. For backporting, it makes sense to just land the subset of those patches that add the kvmalloc_*() functions, rather than changing all of the callsites as well. I don't see any of these helpers in the RHEL 7 kernel I have (linux-3.10.0-514.el6), though kvfree() already exists.

commit a7c3e901a46ff54c016d040847eda598a9e3e653
Author:     Michal Hocko <mhocko@suse.com>
AuthorDate: Mon May 8 15:57:09 2017 -0700
Commit:     Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Mon May 8 17:15:12 2017 -0700

    mm: introduce kv[mz]alloc helpers
    
    Patch series "kvmalloc", v5.
    
    There are many open coded kmalloc with vmalloc fallback instances in the
    tree.  Most of them are not careful enough or simply do not care about
    the underlying semantic of the kmalloc/page allocator which means that
    a) some vmalloc fallbacks are basically unreachable because the kmalloc
    part will keep retrying until it succeeds b) the page allocator can
    invoke a really disruptive steps like the OOM killer to move forward
    which doesn't sound appropriate when we consider that the vmalloc
    fallback is available.
    
    As it can be seen implementing kvmalloc requires quite an intimate
    knowledge if the page allocator and the memory reclaim internals which
    strongly suggests that a helper should be implemented in the memory
    subsystem proper.
    
    Most callers, I could find, have been converted to use the helper
    instead.  This is patch 6.  There are some more relying on __GFP_REPEAT
    in the networking stack which I have converted as well and Eric Dumazet
    was not opposed [2] to convert them as well.
    
    [1] http://lkml.kernel.org/r/20170130094940.13546-1-mhocko@kernel.org
    [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com
    
    This patch (of 9):
    
    Using kmalloc with the vmalloc fallback for larger allocations is a
    common pattern in the kernel code.  Yet we do not have any common helper
    for that and so users have invented their own helpers.  Some of them are
    really creative when doing so.  Let's just add kv[mz]alloc and make sure
    it is implemented properly.  This implementation makes sure to not make
    a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also
    to not warn about allocation failures.  This also rules out the OOM
    killer as the vmalloc is a more approapriate fallback than a disruptive
    user visible action.
    
    This patch also changes some existing users and removes helpers which
    are specific for them.  In some cases this is not possible (e.g.
    ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and
    require GFP_NO{FS,IO} context which is not vmalloc compatible in general
    (note that the page table allocation is GFP_KERNEL).  Those need to be
    fixed separately.

    While we are at it, document that __vmalloc{_node} about unsupported gfp
    mask because there seems to be a lot of confusion out there.
    kvmalloc_node will warn about GFP_KERNEL incompatible (which are not
    superset) flags to catch new abusers.  Existing ones would have to die
    slowly.
    
    [sfr@canb.auug.org.au: f2fs fixup]
      Link: http://lkml.kernel.org/r/20170320163735.332e64b7@canb.auug.org.au
    Link: http://lkml.kernel.org/r/20170306103032.2540-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
    Reviewed-by: Andreas Dilger <adilger@dilger.ca>     [ext4 part]
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: David Miller <davem@davemloft.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit 752ade68cbd81d0321dfecc188f655a945551b25
Author:     Michal Hocko <mhocko@suse.com>
AuthorDate: Mon May 8 15:57:27 2017 -0700
Commit:     Linus Torvalds <torvalds@linux-foundation.org>
CommitDate: Mon May 8 17:15:13 2017 -0700

    treewide: use kv[mz]alloc* rather than opencoded variants
    
    There are many code paths opencoding kvmalloc.  Let's use the helper
    instead.  The main difference to kvmalloc is that those users are
    usually not considering all the aspects of the memory allocator.  E.g.
    allocation requests <= 32kB (with 4kB pages) are basically never failing
    and invoke OOM killer to satisfy the allocation.  This sounds too
    disruptive for something that has a reasonable fallback - the vmalloc.
    On the other hand those requests might fallback to vmalloc even when the
    memory allocator would succeed after several more reclaim/compaction
    attempts previously.  There is no guarantee something like that happens
    though.
    
    This patch converts many of those places to kv[mz]alloc* helpers because
    they are more conservative.

    Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.org
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
    Acked-by: Kees Cook <keescook@chromium.org>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
    Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
    Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
    Acked-by: David Sterba <dsterba@suse.com> # btrfs
    Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
    Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
    Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
    Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Anton Vorontsov <anton@enomsg.org>
    Cc: Colin Cross <ccross@android.com>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
    Cc: Ben Skeggs <bskeggs@redhat.com>
    Cc: Kent Overstreet <kent.overstreet@gmail.com>
    Cc: Santosh Raspatur <santosh@chelsio.com>
    Cc: Hariprasad S <hariprasad@chelsio.com>
    Cc: Yishai Hadas <yishaih@mellanox.com>
    Cc: Oleg Drokin <oleg.drokin@intel.com>
    Cc: "Yan, Zheng" <zyan@redhat.com>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Alexei Starovoitov <ast@kernel.org>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: David Miller <davem@davemloft.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Comment by Cliff White (Inactive) [ 19/Oct/17 ]

Switched to lustre-master-ib build, using MOFED instead of in-kernel drivers. Still having multiple page allocation faults on multiple nodes.

Comment by John Hammond [ 19/Oct/17 ]

Hi Cliff, do you have a crash dump from a MOFED run?

Comment by Sarah Liu [ 19/Oct/17 ]

Hi John,
Cannot find core dump, the following is from soak-17 syslog.

Oct 19 00:15:39 soak-17 systemd-logind: Removed session 246.
Oct 19 00:15:39 soak-17 systemd: Removed slice User Slice of root.
Oct 19 00:15:39 soak-17 systemd: Stopping User Slice of root.
Oct 19 00:15:54 soak-17 kernel: kworker/u480:3: page allocation failure: order:8, mode:0x80d0
Oct 19 00:15:54 soak-17 kernel: CPU: 5 PID: 19810 Comm: kworker/u480:3 Tainted: G           OE  ------------   3.10.0-693.2.2.el7.x86_64 #1
Oct 19 00:15:54 soak-17 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Oct 19 00:15:54 soak-17 kernel: Workqueue: rdma_cm cma_work_handler [rdma_cm]
Oct 19 00:15:54 soak-17 kernel: 00000000000080d0 0000000014a032b0 ffff8806ec793868 ffffffff816a3db1
Oct 19 00:15:54 soak-17 kernel: ffff8806ec7938f8 ffffffff81188810 0000000000000000 ffff88043ffdb000
Oct 19 00:15:54 soak-17 kernel: 0000000000000008 00000000000080d0 ffff8806ec7938f8 0000000014a032b0
Oct 19 00:15:54 soak-17 kernel: Call Trace:
Oct 19 00:15:54 soak-17 kernel: [<ffffffff816a3db1>] dump_stack+0x19/0x1b
Oct 19 00:15:54 soak-17 kernel: [<ffffffff81188810>] warn_alloc_failed+0x110/0x180
Oct 19 00:15:54 soak-17 kernel: [<ffffffff8169fd8a>] __alloc_pages_slowpath+0x6b6/0x724
Oct 19 00:15:54 soak-17 kernel: [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420
Oct 19 00:15:54 soak-17 kernel: [<ffffffff81030f8f>] dma_generic_alloc_coherent+0x8f/0x140
Oct 19 00:15:54 soak-17 kernel: [<ffffffff81064341>] x86_swiotlb_alloc_coherent+0x21/0x50
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc071b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc071b73b>] mlx4_buf_alloc+0x1bb/0x250 [mlx4_core]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc0552425>] create_qp_common+0x645/0x10a0 [mlx4_ib]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc0723c7b>] ? mlx4_cq_alloc+0x4ab/0x580 [mlx4_core]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc0553157>] mlx4_ib_create_qp+0x2a7/0x4d0 [mlx4_ib]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc055dc40>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc04e42aa>] ib_create_qp+0x7a/0x2f0 [ib_core]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc05eb614>] rdma_create_qp+0x34/0xb0 [rdma_cm]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc0bcb5c9>] kiblnd_create_conn+0xbf9/0x1960 [ko2iblnd]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc0bdb8a9>] kiblnd_cm_callback+0x1429/0x22d0 [ko2iblnd]
Oct 19 00:15:54 soak-17 kernel: [<ffffffffc05ef22c>] cma_work_handler+0x6c/0xa0 [rdma_cm]
Oct 19 00:15:54 soak-17 kernel: [<ffffffff810a881a>] process_one_work+0x17a/0x440
Oct 19 00:15:54 soak-17 kernel: [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
Oct 19 00:15:54 soak-17 kernel: [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
Oct 19 00:15:54 soak-17 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
Oct 19 00:15:54 soak-17 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Oct 19 00:15:54 soak-17 kernel: [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
Oct 19 00:15:54 soak-17 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
Oct 19 00:15:54 soak-17 kernel: Mem-Info:
Oct 19 00:15:54 soak-17 kernel: active_anon:3001 inactive_anon:26225 isolated_anon:0#012 active_file:3506921 inactive_file:61152 isolated_file:10#012 unevictable:0 dirty:4 writeback:0 unstable:0#012 slab_reclaimable:30003 slab_unreclaimable:3610646#012 mapped:6288 shmem:4251 pagetables:2713 bounce:0#012 free:650891 free_pcp:3700 free_cma:0
Oct 19 00:15:54 soak-17 kernel: Node 0 DMA free:15848kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15932kB managed:15848kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Oct 19 00:15:54 soak-17 kernel: lowmem_reserve[]: 0 2580 15620 15620
Oct 19 00:15:54 soak-17 kernel: Node 0 DMA32 free:147604kB min:7320kB low:9148kB high:10980kB active_anon:2004kB inactive_anon:5980kB active_file:980924kB inactive_file:28204kB unevictable:0kB isolated(anon):0kB isolated(file):40kB present:3051628kB managed:2643828kB mlocked:0kB dirty:0kB writeback:0kB mapped:364kB shmem:124kB slab_reclaimable:15856kB slab_unreclaimable:1360408kB kernel_stack:1440kB pagetables:1256kB unstable:0kB bounce:0kB free_pcp:4128kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 19 00:15:54 soak-17 kernel: lowmem_reserve[]: 0 0 13040 13040
Oct 19 00:15:54 soak-17 kernel: Node 0 Normal free:699476kB min:37012kB low:46264kB high:55516kB active_anon:2092kB inactive_anon:38928kB active_file:5497500kB inactive_file:91372kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13353036kB mlocked:0kB dirty:4kB writeback:0kB mapped:1296kB shmem:68kB slab_reclaimable:53260kB slab_unreclaimable:6393880kB kernel_stack:6560kB pagetables:5460kB unstable:0kB bounce:0kB free_pcp:4588kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 19 00:15:54 soak-17 kernel: lowmem_reserve[]: 0 0 0 0
Oct 19 00:15:54 soak-17 kernel: Node 1 Normal free:1739896kB min:45728kB low:57160kB high:68592kB active_anon:7908kB inactive_anon:59992kB active_file:7549260kB inactive_file:125032kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16777216kB managed:16498508kB mlocked:0kB dirty:12kB writeback:0kB mapped:23492kB shmem:16812kB slab_reclaimable:50896kB slab_unreclaimable:6688296kB kernel_stack:5632kB pagetables:4136kB unstable:0kB bounce:0kB free_pcp:6760kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Oct 19 00:15:54 soak-17 kernel: lowmem_reserve[]: 0 0 0 0
Oct 19 00:15:54 soak-17 kernel: Node 0 DMA: 0*4kB 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15848kB
Oct 19 00:15:54 soak-17 kernel: Node 0 DMA32: 3105*4kB (UEM) 3479*8kB (UEM) 2316*16kB (UEM) 1767*32kB (UEM) 209*64kB (UM) 9*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 148380kB
Oct 19 00:15:54 soak-17 kernel: Node 0 Normal: 31182*4kB (UEM) 20577*8kB (UEM) 10519*16kB (UEM) 6073*32kB (UEM) 710*64kB (UEM) 18*128kB (UM) 0*256kB 0*512kB 1*1024kB (E) 0*2048kB 0*4096kB = 700752kB
Oct 19 00:15:54 soak-17 kernel: Node 1 Normal: 19696*4kB (UEM) 23200*8kB (UEM) 22433*16kB (UEM) 20069*32kB (UEM) 7102*64kB (UEM) 162*128kB (UEM) 1*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1741040kB
Oct 19 00:15:54 soak-17 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 19 00:15:54 soak-17 kernel: Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 19 00:15:54 soak-17 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Oct 19 00:15:54 soak-17 kernel: Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Oct 19 00:15:54 soak-17 kernel: 3571770 total pagecache pages
Oct 19 00:15:54 soak-17 kernel: 35 pages in swap cache
Oct 19 00:15:54 soak-17 kernel: Swap cache stats: add 1185, delete 1150, find 7/13
Oct 19 00:15:54 soak-17 kernel: Free swap  = 16314956kB
Oct 19 00:15:54 soak-17 kernel: Total swap = 16319484kB
Oct 19 00:15:54 soak-17 kernel: 8369066 pages RAM
Oct 19 00:15:54 soak-17 kernel: 0 pages HighMem/MovableOnly
Oct 19 00:15:54 soak-17 kernel: 241261 pages reserved
Oct 19 00:15:54 soak-17 kernel: kworker/u480:3: page allocation failure: order:8, mode:0x80d0
Oct 19 00:15:54 soak-17 kernel: CPU: 21 PID: 19810 Comm: kworker/u480:3 Tainted: G           OE  ------------   3.10.0-693.2.2.el7.x86_64 #1
Oct 19 00:15:54 soak-17 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013

and the ib module info

[root@soak-17 syslog]# modinfo ib_core
filename:       /lib/modules/3.10.0-693.2.2.el7.x86_64/extra/mlnx-ofa_kernel/drivers/infiniband/core/ib_core.ko
license:        Dual BSD/GPL
description:    core kernel InfiniBand API
author:         Roland Dreier
rhelversion:    7.4
srcversion:     88498DC1AE00B29161E536C
depends:        mlx_compat
vermagic:       3.10.0-693.2.2.el7.x86_64 SMP mod_unload modversions 
parm:           send_queue_size:Size of send queue in number of work requests (int)
parm:           recv_queue_size:Size of receive queue in number of work requests (int)
parm:           roce_v1_noncompat_gid:Default GID auto configuration (Default: yes) (bool)
parm:           force_mr:Force usage of MRs for RDMA READ/WRITE operations (bool)
[root@soak-17 syslog]# 
Comment by John Hammond [ 20/Oct/17 ]

Can we set panic_on_oom and see if we can get a crash dump?

Comment by Cliff White (Inactive) [ 23/Oct/17 ]

I will setup to do this. So far, we have a lot of allocation failures, but very few OOMs, so I may trigger a dump if we don't get one otherwise.

Comment by Oleg Drokin [ 08/Nov/17 ]

I filed this in redhat bugzilla to be ported, so I guess all interested parties should ask for it too (I guess the ticket would be private for Intel at least for now): https://bugzilla.redhat.com/show_bug.cgi?id=1511159

Comment by Gerrit Updater [ 18/Nov/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/30164
Subject: LU-10133 o2iblnd: fall back to vmalloc for mlx4/mlx5
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 4e066b940c61096c876e672d04434cf4a879f415

Comment by Andreas Dilger [ 18/Nov/17 ]

Bob, it seems I don't have the latest kernel sources on my dev system.

Could you please update the patches as appropriate for the various kernels we are building.

Comment by Bob Glossman (Inactive) [ 19/Nov/17 ]

Adding kernel patches as is done in https://review.whamcloud.com/30164 isn't a solution. Client builds use unpatched kernels. We now also have unpatched server builds offered as an option. No fix in those cases from this mod.

Comment by Andreas Dilger [ 19/Nov/17 ]

Bob, I understand it isn't a solution for unpatxhed clients and servers, but for Lustre 2.7 and RHEL6 we don't have unpatched servers at all, and it is also possible for users to install the patched kernel and client if they're having this problem.

Comment by Bob Glossman (Inactive) [ 27/Nov/17 ]

patches refreshed for current supported versions.
the el6 version in particular could be shortened quite a bit since the mlx4 code in the current el6.9 kernel already has this fixed.

Comment by Rick Mohr [ 30/Nov/17 ]

Not sure if this is relevant, but I am seeing something very similar on my Lustre servers (Lustre 2.9, CentOS Linux release 7.3.1611, in-kernel IB support, mlx5 driver).

Comment by Rick Mohr [ 30/Nov/17 ]
Nov 13 04:23:19 haven-oss1 kernel: warn_alloc_failed: 240 callbacks suppressed
 Nov 13 04:23:19 haven-oss1 kernel: kworker/u32:1: page allocation failure: order:9, mode:0x80d0
 Nov 13 04:23:19 haven-oss1 kernel: CPU: 13 PID: 9120 Comm: kworker/u32:1 Tainted: G OE ------------ 3.10.0-514.el7_lustre.x86_64 #1
 Nov 13 04:23:19 haven-oss1 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
 Nov 13 04:23:19 haven-oss1 kernel: Workqueue: rdma_cm cma_work_handler [rdma_cm]
 Nov 13 04:23:19 haven-oss1 kernel: 00000000000080d0 0000000074e4e302 ffff881183c6b810 ffffffff816860f8
 Nov 13 04:23:19 haven-oss1 kernel: ffff881183c6b8a0 ffffffff811869a0 0000000000000000 ffff8816bebd9000
 Nov 13 04:23:19 haven-oss1 kernel: 0000000000000009 00000000000080d0 ffff881183c6b8a0 0000000074e4e302
 Nov 13 04:23:19 haven-oss1 kernel: Call Trace:
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff816860f8>] dump_stack+0x19/0x1b
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff811869a0>] warn_alloc_failed+0x110/0x180
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff81681cb0>] __alloc_pages_slowpath+0x6b7/0x725
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff8118af55>] __alloc_pages_nodemask+0x405/0x420
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff81030fcf>] dma_generic_alloc_coherent+0x8f/0x140
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff81061ed1>] x86_swiotlb_alloc_coherent+0x21/0x50
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa0214bfd>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa0214f7d>] mlx5_buf_alloc_node+0x4d/0xc0 [mlx5_core]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa0215004>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa044d062>] create_kernel_qp.isra.42+0x292/0x7d0 [mlx5_ib]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa044e1ee>] create_qp_common+0xc4e/0xe00 [mlx5_ib]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff8119f25a>] ? kvfree+0x2a/0x40
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff8119f25a>] ? kvfree+0x2a/0x40
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa044e68b>] mlx5_ib_create_qp+0x10b/0x4c0 [mlx5_ib]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa0410a1f>] ib_create_qp+0x3f/0x250 [ib_core]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa03aa584>] rdma_create_qp+0x34/0xb0 [rdma_cm]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa05a3437>] kiblnd_create_conn+0xad7/0x1870 [ko2iblnd]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa05b35f9>] kiblnd_cm_callback+0x1429/0x2290 [ko2iblnd]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffffa03ae3ac>] cma_work_handler+0x6c/0xa0 [rdma_cm]
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff810a8d76>] worker_thread+0x126/0x410
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff810b052f>] kthread+0xcf/0xe0
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff810bf8d6>] ? finish_task_switch+0x56/0x180
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff81696658>] ret_from_fork+0x58/0x90
 Nov 13 04:23:19 haven-oss1 kernel: [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
Comment by Andreas Dilger [ 01/Dec/17 ]

Rick, the https://review.whamcloud.com/30164 patch is definitely for you then. That fixes mlx5 in the same way as mlx4 was fixed for RHEL 7.0-7.4 and RHEL6.x. My original version of the patch also fixed mlx4 for RHEL6.8- but that is already fixed in RHEL 6.9.

Comment by Andreas Dilger [ 06/Dec/17 ]

Unfortunately, my patch https://review.whamcloud.com/30164 doesn't fix all of the allocation problems here. It also seems that fixes that were added to mlx4_buf_alloc() have not all been added to mlx5_buf_alloc(), which means we may need several other commits to reduce allocation size for mlx5, and at least one small improvement for mlx4.

mlx4:

  • add "gfp | __GFP_NOWARN" to the first mlx4_buf_alloc() call in mlx4/qp.c::create_qp_common() so that it doesn't dump a stack on the large-order allocation failure, since there is the fallback to PAGE_SIZE * 2 allocations that should always succeed

mlx5:

  • add "max_direct" argument to mlx4_buf_alloc() to allow specifying a chunk size of PAGE_SIZE * 2 and allocate an array of chunks
  • 40f2287bd "IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO" equivalent to add gfp argument to mlx5_buf_alloc()
  • 73898db04 "net/mlx4: Avoid wrong virtual mappings" equivalent to move mlx5_buf_alloc_node() to mlx5_buf_direct_alloc_node(), and then add the fall back to allocating an array of PAGE_SIZE * 2 pages in mlx5/qp.c::create_qp_common() like mlx4 does
  • add "gfp | __GFP_NOWARN" to the first mlx5_buf_alloc() call so that it doesn't dump a stack on the large-order allocation failure

As a workaround, Amir suggests that adding options ko2iblnd map_on_demand=32 in /etc/modprobe.d/ko2iblnd.conf will reduce the size of the QP allocations and will reduce the frequency/severity of this problem.

Comment by Cliff White (Inactive) [ 06/Dec/17 ]

Unfortunately we have been running map_on_demand=32 for quite a while now. At least a year.
So, we can't try that, or maybe better to say we've already tried it and it's not any help.
Current options in ko2iblnd.cof:

options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1
Comment by Andreas Dilger [ 06/Dec/17 ]

Cliff, the ko2iblnd-opa options are applied for OPA cards only, but the problem affects mlx4 and mlx5 cards, so a separate options ko2iblnd map_on_demand=32 line needs to be added for Mellanox cards. Otherwise, the default is 256.

Comment by Amir Shehata (Inactive) [ 06/Dec/17 ]

One thing we should do before making this the default is some performance testing to see how setting map-on-demand to 32 will impact mlx4 and mlx5. It will reduce memory usage per qp, but we need to double check any performance impact.

Comment by Alexey Lyashkov [ 09/Dec/17 ]

My tests with map_on_demand=256 say 1%-2% perf drop for this case. It's not a big changes i think.

Comment by Chris Hunter (Inactive) [ 11/Dec/17 ]

We were informed these patches are in mellanox ofed 4.2 GA.

There are similar patches applied to kvzalloc for the mlx5 ethernet driver.

1) mm: introduce kv[mz]alloc helpers
https://lwn.net/Articles/708739/
https://patchwork.kernel.org/patch/9493657/

upstream mlx5 patches were commited May 2017:
2) {net, IB}/mlx5: Replace mlx5_vzalloc with kvzalloc
https://github.com/torvalds/linux/commit/1b9a07ee25049724ab7f7c32282fbf5452530cea#diff-3c967034ac4fb744a569c1a4d3a115d3

and Aug 2017:
3) IB/mlx5: use kvmalloc_array for mlx5_ib_wq
https://github.com/torvalds/linux/commit/b588300801f3502a7de5ca897af68019fbb3bc79#diff-06ae82013eb36f3b0e0eeb9c37040f37
https://www.spinics.net/lists/linux-rdma/msg53756.html

also upstream patch for mlx4:
4) IB/mlx4: use kvmalloc_array to allocate wrid
https://github.com/torvalds/linux/commit/e9105cdefbf64cd7aea300f934c92051e7cb7cff#diff-66b8f4939fabacf90437a794c44b9081
https://www.spinics.net/lists/linux-rdma/msg53441.html

 

Comment by Andreas Dilger [ 13/Dec/17 ]

Alexey, did you run any tests with "map_on_demand=32"? I think the default value is 256, but reducing this is important for reducing memory usage.

Comment by Andreas Dilger [ 13/Dec/17 ]

Chris, I believe the problem has been fixed in the upstream kernel, the problem is that users are hitting this regularly on RHEL6/RHEL7 kernels (client and server) with the in-kernel OFED, so it would be good to get a fix for those systems as well.

Comment by James A Simmons [ 04/Jan/18 ]

Which OFED/MOFED version does this fix appear in. For those who want to avoid patched kernels for server side at all cost

Comment by Mahmoud Hanafi [ 09/Jan/18 ]

We are running with MOFED 4.1 and Cent7.4 servers.

options ko2iblnd timeout=150 retry_count=7 peer_timeout=0 map_on_demand=32 peer_credits=63 concurrent_sends=63

Seeing this issue.

n  9 08:37:52 nbp1-oss6 kernel: [1189787.313194] kworker/u48:3: page allocation failure: order:5, mode:0x8010
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313196] CPU: 20 PID: 57793 Comm: kworker/u48:3 Tainted: G           OE  ------------   3.10.0-693.2.2.el7.20170918.x86_64.lustre2101 #1
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313196] Hardware name: SGI.COM CH-C2112-GP2/X10DRU-i+, BIOS 1.0b 05/08/2015
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313200] Workqueue: ipoib_wq ipoib_cm_tx_start [ib_ipoib]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313201]  0000000000008010 0000000022ff91e8 ffff8810a141f7e0 ffffffff81684ac1
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313202]  ffff8810a141f870 ffffffff811841c0 0000000000000000 ffff88207ffd8000
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313203]  0000000000000005 0000000000008010 ffff8810a141f870 0000000022ff91e8
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313204] Call Trace:
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313205]  [<ffffffff81684ac1>] dump_stack+0x19/0x1b
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313207]  [<ffffffff811841c0>] warn_alloc_failed+0x110/0x180
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313209]  [<ffffffff81188984>] __alloc_pages_nodemask+0x9b4/0xba0
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313211]  [<ffffffff811cc688>] alloc_pages_current+0x98/0x110
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313216]  [<ffffffff8118300e>] __get_free_pages+0xe/0x50
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313217]  [<ffffffff8133d41e>] swiotlb_alloc_coherent+0x5e/0x150
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313221]  [<ffffffff810622c1>] x86_swiotlb_alloc_coherent+0x41/0x50
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313224]  [<ffffffffa05aa4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313228]  [<ffffffffa05aa73b>] mlx4_buf_alloc+0x1bb/0x250 [mlx4_core]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313233]  [<ffffffffa07b8435>] create_qp_common+0x645/0x1090 [mlx4_ib]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313237]  [<ffffffffa07b9104>] ? mlx4_ib_create_qp+0x254/0x4d0 [mlx4_ib]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313240]  [<ffffffffa07b9157>] mlx4_ib_create_qp+0x2a7/0x4d0 [mlx4_ib]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313244]  [<ffffffffa07c3c40>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313248]  [<ffffffffa04d02aa>] ib_create_qp+0x7a/0x2f0 [ib_core]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313253]  [<ffffffffa055b2fc>] ipoib_cm_create_tx_qp_rss+0xcc/0x110 [ib_ipoib]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313257]  [<ffffffffa055b9f9>] ipoib_cm_tx_init+0x89/0x2f0 [ib_ipoib]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313260]  [<ffffffffa055d6b8>] ipoib_cm_tx_start+0x248/0x3c0 [ib_ipoib]
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313263]  [<ffffffff810a587a>] process_one_work+0x17a/0x440
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313265]  [<ffffffff810a6546>] worker_thread+0x126/0x3c0
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313266]  [<ffffffff810a6420>] ? manage_workers.isra.24+0x2a0/0x2a0
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313268]  [<ffffffff810ad9ef>] kthread+0xcf/0xe0
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313269]  [<ffffffff810ad920>] ? insert_kthread_work+0x40/0x40
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313270]  [<ffffffff81695ad8>] ret_from_fork+0x58/0x90
Jan  9 08:37:52 nbp1-oss6 kernel: [1189787.313272]  [<ffffffff810ad920>] ? insert_kthread_work+0x40/0x40

So is this issue fixed in MOFED4.2?

 

Comment by Cliff White (Inactive) [ 17/Jan/18 ]

We are seeing what may be this issue on the 2.10.3-RC1 tag
(code)
Jan 17 11:20:40 soak-17 kernel: kworker/u480:2: page allocation failure: order:8, mode:0x80d0
Jan 17 11:20:40 soak-17 kernel: CPU: 5 PID: 119497 Comm: kworker/u480:2 Tainted: G OE ------------ 3.10.0-693.11.6.el7.x86_64 #1
Jan 17 11:20:40 soak-17 kernel: Hardware name: Intel Corporation S2600GZ ........../S2600GZ, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
Jan 17 11:20:40 soak-17 kernel: Workqueue: rdma_cm cma_work_handler [rdma_cm]
Jan 17 11:20:40 soak-17 kernel: Call Trace:
Jan 17 11:20:40 soak-17 kernel: [<ffffffff816a5ea1>] dump_stack+0x19/0x1b
Jan 17 11:20:40 soak-17 kernel: [<ffffffff8118a510>] warn_alloc_failed+0x110/0x180
Jan 17 11:20:40 soak-17 kernel: [<ffffffff816a1e7a>] __alloc_pages_slowpath+0x6b6/0x724
Jan 17 11:20:41 soak-17 kernel: [<ffffffff8118eaa5>] __alloc_pages_nodemask+0x405/0x420
Jan 17 11:20:41 soak-17 kernel: [<ffffffff81030e8f>] dma_generic_alloc_coherent+0x8f/0x140
Jan 17 11:20:41 soak-17 kernel: [<ffffffff810645d1>] x86_swiotlb_alloc_coherent+0x21/0x50
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc02dd4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc02dd73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc01db4a6>] create_qp_common+0x536/0x1000 [mlx4_ib]
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc01dc3d1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc01c7bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc01e7f30>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc00d952a>] ib_create_qp+0x7a/0x2f0 [ib_core]
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc05a65d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
Jan 17 11:20:41 soak-17 kernel: [<ffffffffc0bf45c9>] kiblnd_create_conn+0xbf9/0x1960 [ko2iblnd]
Jan 17 11:20:42 soak-17 kernel: [<ffffffff816af3c6>] ? common_interrupt+0x106/0x232
Jan 17 11:20:42 soak-17 kernel: [<ffffffffc0c047af>] kiblnd_cm_callback+0x145f/0x2380 [ko2iblnd]
Jan 17 11:20:42 soak-17 kernel: [<ffffffffc05aa11c>] cma_work_handler+0x6c/0xa0 [rdma_cm]
Jan 17 11:20:42 soak-17 kernel: [<ffffffff810aa3ba>] process_one_work+0x17a/0x440
Jan 17 11:20:42 soak-17 kernel: [<ffffffff810ab086>] worker_thread+0x126/0x3c0
Jan 17 11:20:42 soak-17 kernel: [<ffffffff810aaf60>] ? manage_workers.isra.24+0x2a0/0x2a0
Jan 17 11:20:42 soak-17 kernel: [<ffffffff810b252f>] kthread+0xcf/0xe0
Jan 17 11:20:42 soak-17 kernel: [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
Jan 17 11:20:42 soak-17 kernel: [<ffffffff816b8798>] ret_from_fork+0x58/0x90
Jan 17 11:20:42 soak-17 kernel: [<ffffffff810b2460>] ? insert_kthread_work+0x40/0x40
Jan 17 11:20:42 soak-17 kernel: Mem-Info:
Jan 17 11:20:42 soak-17 kernel: active_anon:38088 inactive_anon:40507 isolated_anon:0#012 active_file:2789491 inactive_file:301913 isolated_file:10#012 unevictable:0 dirty:20 writeback:0 unstable:0#012 slab_reclaimable:31599 slab_unreclaimable:4366903#012 mapped:9817 shmem:26652 pagetables:2238 bounce:0#012 free:316930 free_pcp:3684 free_cma:0
Jan 17 11:20:43 soak-17 kernel: Node 0 DMA free:15848kB min:40kB low:48kB high:60kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15932kB managed:15848kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jan 17 11:20:43 soak-17 kernel: lowmem_reserve[]: 0 2580 15619 15619
Jan 17 11:20:43 soak-17 kernel: Node 0 DMA32 free:111620kB min:7320kB low:9148kB high:10980kB active_anon:4340kB inactive_anon:16340kB active_file:806572kB inactive_file:91160kB unevictable:0kB isolated(anon):0kB isolated(file):40kB present:3051628kB managed:2643792kB mlocked:0kB dirty:4kB writeback:0kB mapped:4972kB shmem:10624kB slab_reclaimable:10900kB slab_unreclaimable:1503576kB kernel_stack:1680kB pagetables:836kB unstable:0kB bounce:0kB free_pcp:4204kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 17 11:20:44 soak-17 kernel: lowmem_reserve[]: 0 0 13039 13039
Jan 17 11:20:44 soak-17 kernel: Node 0 Normal free:479720kB min:37012kB low:46264kB high:55516kB active_anon:80388kB inactive_anon:90464kB active_file:4307316kB inactive_file:454368kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13352076kB mlocked:0kB dirty:40kB writeback:0kB mapped:23080kB shmem:63164kB slab_reclaimable:59148kB slab_unreclaimable:7310632kB kernel_stack:7168kB pagetables:5300kB unstable:0kB bounce:0kB free_pcp:6380kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 17 11:20:45 soak-17 kernel: lowmem_reserve[]: 0 0 0 0
Jan 17 11:20:45 soak-17 kernel: Node 1 Normal free:668892kB min:45728kB low:57160kB high:68592kB active_anon:67496kB inactive_anon:55352kB active_file:6041560kB inactive_file:661408kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:16777216kB managed:16497548kB mlocked:0kB dirty:20kB writeback:0kB mapped:11260kB shmem:32820kB slab_reclaimable:56348kB slab_unreclaimable:8652444kB kernel_stack:4368kB pagetables:2820kB unstable:0kB bounce:0kB free_pcp:4820kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jan 17 11:20:45 soak-17 kernel: lowmem_reserve[]: 0 0 0 0
(code)
We are running OFED 4.2, i believe.

Comment by Jay Lan (Inactive) [ 28/Feb/18 ]

The patch https://review.whamcloud.com/30164 would change two kmalloc() calls in create_qp_common() so that a __vmalloc() call would be made in case kmalloc() fails.

However, both Mahmoud and Cliff White reported a failure at difference locations: mlx4_buf_alloc() call inside the create_qp_common() routine. The fix from #30164 would have no effect on our problem.

[558213.837942] [<ffffffff81686d81>] dump_stack+0x19/0x1b
[558213.837946] [<ffffffff81186160>] warn_alloc_failed+0x110/0x180
[558213.837949] [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
[558213.837951] [<ffffffff811ce868>] alloc_pages_current+0x98/0x110
[558213.837954] [<ffffffff81184fae>] __get_free_pages+0xe/0x50
[558213.837956] [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
[558213.837959] [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
[558213.837968] [<ffffffffa04704c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
[558213.837975] [<ffffffffa047073b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
[558213.837980] [<ffffffffa05ff496>] create_qp_common+0x536/0x1000 [mlx4_ib]

Comment by Andreas Dilger [ 23/Nov/18 ]

This issue is fixed in the MOFED 4.4 release.

Generated at Sat Feb 10 02:32:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.