Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
Lustre 2.10.1, Lustre 2.10.3
-
None
-
mofed4.1 and mofed.4.2.1
-
2
-
9223372036854775807
Description
This may be a duplicate of LU-10133 but we wanted to open a new case for our tracking. LU-10133 hasn't been updated in sometime.
We are seeing these page allocation failure since moving to lustre2.10.x, centos7.x and mofed 4.x
6174.345922] kworker/20:1: page allocation failure: order:8, mode:0x80d0 [ 6174.345924] CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------ 3.10.0-693.17.1.el7.20180206.x86_64.lustre2103 #1 [ 6174.345925] Hardware name: SGI.COM SUMMIT/S2600GZ, BIOS SE5C600.86B.02.03.0003.041920141333 04/19/2014 [ 6174.345932] Workqueue: ib_cm cm_work_handler [ib_cm] [ 6174.345933] Call Trace: [ 6174.345934] [<ffffffff81686d81>] dump_stack+0x19/0x1b [ 6174.345937] [<ffffffff81186160>] warn_alloc_failed+0x110/0x180 [ 6174.345940] [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0 [ 6174.345942] [<ffffffff811ce868>] alloc_pages_current+0x98/0x110 [ 6174.345945] [<ffffffff81184fae>] __get_free_pages+0xe/0x50 [ 6174.345947] [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150 [ 6174.345950] [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50 [ 6174.345959] [<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core] [ 6174.345967] [<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core] [ 6174.345971] [<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib] [ 6174.345976] [<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0 [ 6174.345979] [<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib] [ 6174.345982] [<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib] [ 6174.345985] [<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib] [ 6174.345989] [<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core] [ 6174.345995] [<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm] [ 6174.345997] [<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd] [ 6174.346003] [<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs] [ 6174.346014] [<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd] [ 6174.346021] [<ffffffffa061f58c>] ? _cma_attach_to_dev+0x5c/0x70 [rdma_cm] [ 6174.346023] [<ffffffffa0836a95>] kiblnd_cm_callback+0x755/0x2380 [ko2iblnd] [ 6174.346028] [<ffffffffa0624ce6>] cma_req_handler+0x1c6/0x490 [rdma_cm] [ 6174.346030] [<ffffffffa04f4327>] cm_process_work+0x27/0x120 [ib_cm] [ 6174.346033] [<ffffffffa04f516b>] cm_req_handler+0xb0b/0xe30 [ib_cm] [ 6174.346035] [<ffffffffa04f5e55>] cm_work_handler+0x395/0x1306 [ib_cm] [ 6174.346036] [<ffffffff8168beff>] ? __schedule+0x41f/0x9a0 [ 6174.346038] [<ffffffff810a76ca>] process_one_work+0x17a/0x440 [ 6174.346040] [<ffffffff810a8396>] worker_thread+0x126/0x3c0 [ 6174.346042] [<ffffffff810a8270>] ? manage_workers.isra.24+0x2a0/0x2a0 [ 6174.346043] [<ffffffff810af83f>] kthread+0xcf/0xe0 [ 6174.346044] [<ffffffff810af770>] ? insert_kthread_work+0x40/0x40 [ 6174.346045] [<ffffffff81699718>] ret_from_fork+0x58/0x90 [ 6174.346048] [<ffffffff810af770>] ? insert_kthread_work+0x40/0x40 [ 6174.346049] Mem-Info: [ 6174.346054] active_anon:11554 inactive_anon:6856 isolated_anon:0 [ 6174.346054] active_file:565346 inactive_file:12784204 isolated_file:0 [ 6174.346054] unevictable:4142 dirty:54 writeback:0 unstable:0 [ 6174.346054] slab_reclaimable:259526 slab_unreclaimable:357063 [ 6174.346054] mapped:8121 shmem:7283 pagetables:911 bounce:0 [ 6174.346054] free:63457 free_pcp:746 free_cma:0 [ 6174.346055] Node 0 DMA free:15772kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15980kB managed:15804kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes [ 6174.346058] lowmem_reserve[]: 0 2641 31809 31809 [ 6174.346060] Node 0 DMA32 free:123636kB min:2704kB low:3380kB high:4056kB active_anon:2132kB inactive_anon:1916kB active_file:99500kB inactive_file:1867404kB unevictable:376kB isolated(anon):0kB isolated(file):0kB present:3049136kB managed:2707148kB mlocked:376kB dirty:4kB writeback:0kB mapped:3368kB shmem:2036kB slab_reclaimable:48796kB slab_unreclaimable:69468kB kernel_stack:192kB pagetables:248kB unstable:0kB bounce:0kB free_pcp:188kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 6174.346063] lowmem_reserve[]: 0 0 29167 29167 [ 6174.346064] Node 0 Normal free:54308kB min:29836kB low:37292kB high:44752kB active_anon:21816kB inactive_anon:19336kB active_file:971960kB inactive_file:23507144kB unevictable:7280kB isolated(anon):0kB isolated(file):0kB present:30408704kB managed:29867144kB mlocked:7280kB dirty:100kB writeback:0kB mapped:28800kB shmem:19912kB slab_reclaimable:503968kB slab_unreclaimable:637584kB kernel_stack:7136kB pagetables:1428kB unstable:0kB bounce:0kB free_pcp:1536kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 6174.346067] lowmem_reserve[]: 0 0 0 0 [ 6174.346069] Node 1 Normal free:60112kB min:32976kB low:41220kB high:49464kB active_anon:22268kB inactive_anon:6172kB active_file:1189924kB inactive_file:25762476kB unevictable:8912kB isolated(anon):0kB isolated(file):0kB present:33554432kB managed:33012624kB mlocked:8912kB dirty:112kB writeback:0kB mapped:316kB shmem:7184kB slab_reclaimable:485340kB slab_unreclaimable:721200kB kernel_stack:12208kB pagetables:1968kB unstable:0kB bounce:0kB free_pcp:1216kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no [ 6174.346071] lowmem_reserve[]: 0 0 0 0 [ 6174.346073] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 2*128kB (U) 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15772kB [ 6174.346078] Node 0 DMA32: 2456*4kB (UEM) 1686*8kB (UEM) 542*16kB (UEM) 82*32kB (UEM) 156*64kB (UEM) 227*128kB (UEM) 106*256kB (UE) 43*512kB (UE) 1*1024kB (U) 0*2048kB 0*4096kB = 123824kB [ 6174.346084] Node 0 Normal: 262*4kB (EM) 876*8kB (UEM) 952*16kB (UEM) 296*32kB (UEM) 154*64kB (UM) 95*128kB (UM) 2*256kB (M) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 55800kB [ 6174.346089] Node 1 Normal: 219*4kB (EM) 2848*8kB (UEM) 366*16kB (UEM) 46*32kB (UM) 474*64kB (UM) 2*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 61580kB [ 6174.346100] 13359368 total pagecache pages [ 6174.346100] 0 pages in swap cache [ 6174.346101] Swap cache stats: add 0, delete 0, find 0/0 [ 6174.346101] Free swap = 0kB [ 6174.346101] Total swap = 0kB [ 6174.346102] 16757063 pages RAM [ 6174.346102] 0 pages HighMem/MovableOnly [ 6174.346102] 356383 pages reserved
Attachments
Issue Links
- is related to
-
LU-10133 Multi-page allocation failures in mlx4/mlx5
- Resolved