[LU-7401] OOM after LNet initialization with not default peer_creadits on mlx5 Created: 06/Nov/15  Updated: 21/Dec/15  Resolved: 09/Nov/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Dmitry Eremin (Inactive) Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

Lustre: Build Version: 2.7.62-g8248c89-CHANGED-2.6.32-573.7.1.el6_lustre.g95557d5.x86_64
MLNX_OFED_LINUX-3.1-1.0.3-rhel6.7-x86_64


Issue Links:
Duplicate
duplicates LU-3322 ko2iblnd support for different map_on... Resolved
Related
is related to LU-5718 RDMA too fragmented with router Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   
# ibv_devinfo -v
hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         12.12.1100
        node_guid:                      e41d:2d03:005f:48e6
        sys_image_guid:                 e41d:2d03:005f:48e6
        vendor_id:                      0x02c9
        vendor_part_id:                 4115
        hw_ver:                         0x0
        board_id:                       MT_2180110032
        phys_port_cnt:                  1
        max_mr_size:                    0xffffffffffffffff
        page_size_cap:                  0xfffff000
        max_qp:                         262144
        max_qp_wr:                      32768
        device_cap_flags:               0x40509c36
                                        BAD_PKEY_CNTR
                                        BAD_QKEY_CNTR
                                        AUTO_PATH_MIG
                                        CHANGE_PHY_PORT
                                        PORT_ACTIVE_EVENT
                                        SYS_IMAGE_GUID
                                        RC_RNR_NAK_GEN
                                        XRC
                                        Unknown flags: 0x40408000
        device_cap_exp_flags:           0x5060007100000000
                                        EXP_DC_TRANSPORT
                                        EXP_MEM_MGT_EXTENSIONS
                                        EXP_CROSS_CHANNEL
                                        EXP_MR_ALLOCATE
                                        EXT_ATOMICS
                                        EXT_SEND NOP
                                        EXP_UMR
                                        EXP_DC_INFO
        max_sge:                        30
        max_sge_rd:                     0
        max_cq:                         16777216
        max_cqe:                        4194303
        max_mr:                         16777216
        max_pd:                         16777216
        max_qp_rd_atom:                 16
        max_ee_rd_atom:                 0
        max_res_rd_atom:                4194304
        max_qp_init_rd_atom:            16
        max_ee_init_rd_atom:            0
        atomic_cap:                     ATOMIC_HCA_REPLY_BE (64)
        log atomic arg sizes (mask)             3c
        max fetch and add bit boundary  64
        log max atomic inline           5
        max_ee:                         0
        max_rdd:                        0
        max_mw:                         0
        max_raw_ipv6_qp:                0
        max_raw_ethy_qp:                0
        max_mcast_grp:                  2097152
        max_mcast_qp_attach:            48
        max_total_mcast_qp_attach:      100663296
        max_ah:                         2147483647
        max_fmr:                        0
        max_srq:                        8388608
        max_srq_wr:                     32767
        max_srq_sge:                    31
        max_pkeys:                      128
        local_ca_ack_delay:             16
        hca_core_clock:                 0
        max_klm_list_size:              65536
        max_send_wqe_inline_klms:       20
        max_umr_recursion_depth:        4
        max_umr_stride_dimension:       1
        general_odp_caps:
        rc_odp_caps:
                                        NO SUPPORT
        uc_odp_caps:
                                        NO SUPPORT
        ud_odp_caps:
                                        NO SUPPORT
        dc_odp_caps:
                                        NO SUPPORT
        xrc_odp_caps:
                                        NO SUPPORT
        raw_eth_odp_caps:
                                        NO SUPPORT
        max_dct:                        262144
        max_device_ctx:                 1020
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 19
                        port_lid:               48
                        port_lmc:               0x00
                        link_layer:             InfiniBand
                        max_msg_sz:             0x40000000
                        port_cap_flags:         0x2651e848
                        max_vl_num:             4 (3)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           128
                        gid_tbl_len:            8
                        subnet_timeout:         18
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           25.0 Gbps (32)
                        phys_state:             LINK_UP (5)
                        GID[  0]:               fe80:0000:0000:0000:e41d:2d03:005f:48e6

The following lnd tunables are used:

options ko2iblnd credits=2560 ntx=5120 concurrent_sends=63 peer_credits=16

After few minutest after LNet initialization OOM happens with following messages:

Nov  5 19:01:43 oss-3-0 kernel: Lustre: Lustre: Build Version: 2.7.62-g8248c89-CHANGED-2.6.32-573.7.1.el6_lustre.g95557d5.x86_64
Nov  5 19:01:43 oss-3-0 kernel: LNet: Added LNI 192.168.3.104@o2ib [16/2560/0/180]
Nov  5 19:03:01 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0
Nov  5 19:03:01 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1
Nov  5 19:03:01 oss-3-0 kernel: Call Trace:
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Nov  5 19:03:01 oss-3-0 kernel: Mem-Info:
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA per-cpu:
Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32 per-cpu:
Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal per-cpu:
Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd: 135
Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal per-cpu:
Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:  34
Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: active_anon:4820 inactive_anon:4483 isolated_anon:0
Nov  5 19:03:01 oss-3-0 kernel: active_file:5926 inactive_file:8941 isolated_file:0
Nov  5 19:03:01 oss-3-0 kernel: unevictable:3806 dirty:153 writeback:0 unstable:0
Nov  5 19:03:01 oss-3-0 kernel: free:55741 slab_reclaimable:49595 slab_unreclaimable:4005232
Nov  5 19:03:01 oss-3-0 kernel: mapped:4159 shmem:59 pagetables:991 bounce:0
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir
ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32 free:121244kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc
ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18560kB slab_unreclaimable:2218348kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290
Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal free:40852kB min:40888kB low:51108kB high:61332kB active_anon:1740kB inactive_anon:1028kB active_file:7732kB inactive_file:10852kB unevictable:13992kB isolated(anon):0kB isolated(file):0kB pr
esent:29992960kB mlocked:5820kB dirty:640kB writeback:0kB mapped:7480kB shmem:116kB slab_reclaimable:135120kB slab_unreclaimable:7843632kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all
_unreclaimable? no
Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal free:45180kB min:45120kB low:56400kB high:67680kB active_anon:17540kB inactive_anon:16904kB active_file:15972kB inactive_file:24912kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB
present:33095680kB mlocked:1232kB dirty:0kB writeback:0kB mapped:9156kB shmem:120kB slab_reclaimable:44700kB slab_unreclaimable:5958948kB kernel_stack:2608kB pagetables:3196kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all
_unreclaimable? no
Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 10*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121244kB
Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal: 1413*4kB 764*8kB 348*16kB 190*32kB 98*64kB 33*128kB 19*256kB 4*512kB 1*1024kB 0*2048kB 0*4096kB = 41844kB
Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal: 2525*4kB 1493*8kB 597*16kB 209*32kB 47*64kB 13*128kB 3*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 45772kB
Nov  5 19:03:01 oss-3-0 kernel: 15551 total pagecache pages
Nov  5 19:03:01 oss-3-0 kernel: 0 pages in swap cache
Nov  5 19:03:01 oss-3-0 kernel: Swap cache stats: add 0, delete 0, find 0/0
Nov  5 19:03:01 oss-3-0 kernel: Free swap  = 33005564kB
Nov  5 19:03:01 oss-3-0 kernel: Total swap = 33005564kB
Nov  5 19:03:01 oss-3-0 kernel: 16777215 pages RAM
Nov  5 19:03:01 oss-3-0 kernel: 310798 pages reserved
Nov  5 19:03:01 oss-3-0 kernel: 13266 pages shared
Nov  5 19:03:01 oss-3-0 kernel: 12503778 pages non-shared
Nov  5 19:03:01 oss-3-0 kernel: LNetError: 2957:0:(o2iblnd.c:870:kiblnd_create_conn()) Can't create QP: -12, send_wr: 8224, recv_wr: 34
Nov  5 19:03:01 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0
Nov  5 19:03:01 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1
Nov  5 19:03:01 oss-3-0 kernel: Call Trace:
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81160cea>] ? alloc_vmap_area+0x27a/0x390
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0181ac8>] ? mlx5_debug_cq_add+0x48/0x60 [mlx5_core]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm]
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Nov  5 19:03:01 oss-3-0 kernel: Mem-Info:
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA per-cpu:
Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:    0, btch:   1 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32 per-cpu:
Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal per-cpu:
Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:  23
Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal per-cpu:
Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:01 oss-3-0 kernel: active_anon:4149 inactive_anon:4131 isolated_anon:0
Nov  5 19:03:01 oss-3-0 kernel: active_file:2501 inactive_file:2188 isolated_file:0
Nov  5 19:03:01 oss-3-0 kernel: unevictable:3807 dirty:90 writeback:0 unstable:0
Nov  5 19:03:01 oss-3-0 kernel: free:56462 slab_reclaimable:48868 slab_unreclaimable:4008137
Nov  5 19:03:01 oss-3-0 kernel: mapped:3400 shmem:29 pagetables:931 bounce:0
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir
ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32 free:121276kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc
ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18540kB slab_unreclaimable:2218360kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290
Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal free:41476kB min:40888kB low:51108kB high:61332kB active_anon:0kB inactive_anon:0kB active_file:5136kB inactive_file:3892kB unevictable:13996kB isolated(anon):0kB isolated(file):0kB present:2
9992960kB mlocked:5820kB dirty:312kB writeback:0kB mapped:7444kB shmem:0kB slab_reclaimable:134716kB slab_unreclaimable:7844724kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclai
mable? no
Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal free:47408kB min:45120kB low:56400kB high:67680kB active_anon:16596kB inactive_anon:16524kB active_file:4868kB inactive_file:4860kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB pr
esent:33095680kB mlocked:1232kB dirty:48kB writeback:0kB mapped:6156kB shmem:116kB slab_reclaimable:42216kB slab_unreclaimable:5969464kB kernel_stack:2624kB pagetables:2956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_
unreclaimable? no
Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB
Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 11*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121276kB
Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal: 1238*4kB 864*8kB 496*16kB 217*32kB 95*64kB 28*128kB 12*256kB 1*512kB 2*1024kB 0*2048kB 0*4096kB = 42040kB
Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal: 1780*4kB 1147*8kB 637*16kB 299*32kB 112*64kB 27*128kB 3*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 48472kB
Nov  5 19:03:01 oss-3-0 kernel: 5791 total pagecache pages
Nov  5 19:03:01 oss-3-0 kernel: 38 pages in swap cache
Nov  5 19:03:01 oss-3-0 kernel: Swap cache stats: add 1104, delete 1066, find 0/1
Nov  5 19:03:01 oss-3-0 kernel: Free swap  = 33001180kB
Nov  5 19:03:01 oss-3-0 kernel: Total swap = 33005564kB
Nov  5 19:03:02 oss-3-0 kernel: 16777215 pages RAM
Nov  5 19:03:02 oss-3-0 kernel: 310798 pages reserved
Nov  5 19:03:02 oss-3-0 kernel: 11573 pages shared
Nov  5 19:03:02 oss-3-0 kernel: 12507708 pages non-shared
Nov  5 19:03:02 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0
Nov  5 19:03:02 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1
Nov  5 19:03:02 oss-3-0 kernel: Call Trace:
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81160cea>] ? alloc_vmap_area+0x27a/0x390
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0181ac8>] ? mlx5_debug_cq_add+0x48/0x60 [mlx5_core]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm]
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
Nov  5 19:03:02 oss-3-0 kernel: Mem-Info:
Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA per-cpu:
Nov  5 19:03:02 oss-3-0 kernel: CPU    0: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    1: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    2: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    3: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    4: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    5: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    6: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    7: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    8: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    9: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   10: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   11: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   12: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   13: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   14: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   15: hi:    0, btch:   1 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA32 per-cpu:
Nov  5 19:03:02 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: Node 0 Normal per-cpu:
Nov  5 19:03:02 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: Node 1 Normal per-cpu:
Nov  5 19:03:02 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
Nov  5 19:03:02 oss-3-0 kernel: active_anon:3235 inactive_anon:3163 isolated_anon:0
Nov  5 19:03:02 oss-3-0 kernel: active_file:2501 inactive_file:2191 isolated_file:0
Nov  5 19:03:02 oss-3-0 kernel: unevictable:3807 dirty:90 writeback:2632 unstable:0
Nov  5 19:03:02 oss-3-0 kernel: free:57038 slab_reclaimable:48868 slab_unreclaimable:4009033
Nov  5 19:03:02 oss-3-0 kernel: mapped:3401 shmem:29 pagetables:931 bounce:0
Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir
ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov  5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211
Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA32 free:121276kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc
ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18540kB slab_unreclaimable:2218360kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Nov  5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290
Nov  5 19:03:02 oss-3-0 kernel: Node 0 Normal free:40836kB min:40888kB low:51108kB high:61332kB active_anon:0kB inactive_anon:0kB active_file:5136kB inactive_file:3892kB unevictable:13996kB isolated(anon):0kB isolated(file):0kB present:2
9992960kB mlocked:5820kB dirty:312kB writeback:0kB mapped:7444kB shmem:0kB slab_reclaimable:134716kB slab_unreclaimable:7844728kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:240 all_unrecl
aimable? no
Nov  5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
Nov  5 19:03:02 oss-3-0 kernel: Node 1 Normal free:50352kB min:45120kB low:56400kB high:67680kB active_anon:12940kB inactive_anon:12652kB active_file:4868kB inactive_file:4872kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB pr
esent:33095680kB mlocked:1232kB dirty:48kB writeback:10528kB mapped:6160kB shmem:116kB slab_reclaimable:42216kB slab_unreclaimable:5973044kB kernel_stack:2608kB pagetables:2956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Nov  5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB
Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 11*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121276kB
Nov  5 19:03:02 oss-3-0 kernel: Node 0 Normal: 1244*4kB 853*8kB 492*16kB 216*32kB 95*64kB 27*128kB 9*256kB 2*512kB 1*1024kB 0*2048kB 0*4096kB = 40472kB
Nov  5 19:03:02 oss-3-0 kernel: Node 1 Normal: 1542*4kB 1019*8kB 595*16kB 334*32kB 145*64kB 37*128kB 7*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50336kB
Nov  5 19:03:02 oss-3-0 kernel: 8057 total pagecache pages
Nov  5 19:03:02 oss-3-0 kernel: 2542 pages in swap cache
Nov  5 19:03:02 oss-3-0 kernel: Swap cache stats: add 5785, delete 3243, find 20/26
Nov  5 19:03:02 oss-3-0 kernel: Free swap  = 32982620kB
Nov  5 19:03:02 oss-3-0 kernel: Total swap = 33005564kB
Nov  5 19:03:02 oss-3-0 kernel: 16777215 pages RAM
Nov  5 19:03:02 oss-3-0 kernel: 310798 pages reserved
Nov  5 19:03:02 oss-3-0 kernel: 11578 pages shared
Nov  5 19:03:02 oss-3-0 kernel: 12507859 pages non-shared


 Comments   
Comment by Doug Oucharek (Inactive) [ 06/Nov/15 ]

Is this on a client, server, or LNet router? How many connections are being made to that node?

Comment by Dmitry Eremin (Inactive) [ 09/Nov/15 ]

This happened just after LNet initialization for selftest usage. There are no Lustre usage at all. Maybe there is just spontaneous ping, not more. The process rdma_cm begin to allocate the memory and it's constantly grows until OOM.

Comment by Dmitry Eremin (Inactive) [ 09/Nov/15 ]

It looks this is an issue of incomplete path for LU-3322. With patch http://review.whamcloud.com/#/c/17074/2 this OOM don't happens any more.

Comment by Peter Jones [ 09/Nov/15 ]

ok so it makes sense to duplicate this into LU-3322 then

Comment by Jeremy Filizetti [ 09/Nov/15 ]

If LU-3322 is affecting this then I'm guessing that you are stuck in a tight loop of kiblnd_reject from the server side and kiblnd_rejected->kiblnd_reconnect on the client side. The values for ibc_queue_depth and ibc_max_frags were not being preserved so every connection attempt was rejected for the same reason and attempted to be reconnected. It might make sense to put a counter in kib_peer_t to break or throttle the reconnect loop even though this particular symptom has been fixed.

Looks like http://review.whamcloud.com/14600 handles this for the conn race condition (LU-5718)

Comment by Doug Oucharek (Inactive) [ 10/Nov/15 ]

I can see how http://review.whamcloud.com/14600 addresses the reconnect issue for the clients, but I am not convinced that this patch will help the passive side, the server, out here. If you have 10,000 clients competing to reconnect, the server can still be hammered into OOM even when the clients clean up zombies before reattempting.

Generated at Sat Feb 10 02:08:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.