Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7401

OOM after LNet initialization with not default peer_creadits on mlx5

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • None
    • None
    • Lustre: Build Version: 2.7.62-g8248c89-CHANGED-2.6.32-573.7.1.el6_lustre.g95557d5.x86_64
      MLNX_OFED_LINUX-3.1-1.0.3-rhel6.7-x86_64
    • 3
    • 9223372036854775807

    Description

      # ibv_devinfo -v
      hca_id: mlx5_0
              transport:                      InfiniBand (0)
              fw_ver:                         12.12.1100
              node_guid:                      e41d:2d03:005f:48e6
              sys_image_guid:                 e41d:2d03:005f:48e6
              vendor_id:                      0x02c9
              vendor_part_id:                 4115
              hw_ver:                         0x0
              board_id:                       MT_2180110032
              phys_port_cnt:                  1
              max_mr_size:                    0xffffffffffffffff
              page_size_cap:                  0xfffff000
              max_qp:                         262144
              max_qp_wr:                      32768
              device_cap_flags:               0x40509c36
                                              BAD_PKEY_CNTR
                                              BAD_QKEY_CNTR
                                              AUTO_PATH_MIG
                                              CHANGE_PHY_PORT
                                              PORT_ACTIVE_EVENT
                                              SYS_IMAGE_GUID
                                              RC_RNR_NAK_GEN
                                              XRC
                                              Unknown flags: 0x40408000
              device_cap_exp_flags:           0x5060007100000000
                                              EXP_DC_TRANSPORT
                                              EXP_MEM_MGT_EXTENSIONS
                                              EXP_CROSS_CHANNEL
                                              EXP_MR_ALLOCATE
                                              EXT_ATOMICS
                                              EXT_SEND NOP
                                              EXP_UMR
                                              EXP_DC_INFO
              max_sge:                        30
              max_sge_rd:                     0
              max_cq:                         16777216
              max_cqe:                        4194303
              max_mr:                         16777216
              max_pd:                         16777216
              max_qp_rd_atom:                 16
              max_ee_rd_atom:                 0
              max_res_rd_atom:                4194304
              max_qp_init_rd_atom:            16
              max_ee_init_rd_atom:            0
              atomic_cap:                     ATOMIC_HCA_REPLY_BE (64)
              log atomic arg sizes (mask)             3c
              max fetch and add bit boundary  64
              log max atomic inline           5
              max_ee:                         0
              max_rdd:                        0
              max_mw:                         0
              max_raw_ipv6_qp:                0
              max_raw_ethy_qp:                0
              max_mcast_grp:                  2097152
              max_mcast_qp_attach:            48
              max_total_mcast_qp_attach:      100663296
              max_ah:                         2147483647
              max_fmr:                        0
              max_srq:                        8388608
              max_srq_wr:                     32767
              max_srq_sge:                    31
              max_pkeys:                      128
              local_ca_ack_delay:             16
              hca_core_clock:                 0
              max_klm_list_size:              65536
              max_send_wqe_inline_klms:       20
              max_umr_recursion_depth:        4
              max_umr_stride_dimension:       1
              general_odp_caps:
              rc_odp_caps:
                                              NO SUPPORT
              uc_odp_caps:
                                              NO SUPPORT
              ud_odp_caps:
                                              NO SUPPORT
              dc_odp_caps:
                                              NO SUPPORT
              xrc_odp_caps:
                                              NO SUPPORT
              raw_eth_odp_caps:
                                              NO SUPPORT
              max_dct:                        262144
              max_device_ctx:                 1020
                      port:   1
                              state:                  PORT_ACTIVE (4)
                              max_mtu:                4096 (5)
                              active_mtu:             4096 (5)
                              sm_lid:                 19
                              port_lid:               48
                              port_lmc:               0x00
                              link_layer:             InfiniBand
                              max_msg_sz:             0x40000000
                              port_cap_flags:         0x2651e848
                              max_vl_num:             4 (3)
                              bad_pkey_cntr:          0x0
                              qkey_viol_cntr:         0x0
                              sm_sl:                  0
                              pkey_tbl_len:           128
                              gid_tbl_len:            8
                              subnet_timeout:         18
                              init_type_reply:        0
                              active_width:           4X (2)
                              active_speed:           25.0 Gbps (32)
                              phys_state:             LINK_UP (5)
                              GID[  0]:               fe80:0000:0000:0000:e41d:2d03:005f:48e6
      

      The following lnd tunables are used:

      options ko2iblnd credits=2560 ntx=5120 concurrent_sends=63 peer_credits=16

      After few minutest after LNet initialization OOM happens with following messages:

      Nov  5 19:01:43 oss-3-0 kernel: Lustre: Lustre: Build Version: 2.7.62-g8248c89-CHANGED-2.6.32-573.7.1.el6_lustre.g95557d5.x86_64
      Nov  5 19:01:43 oss-3-0 kernel: LNet: Added LNI 192.168.3.104@o2ib [16/2560/0/180]
      Nov  5 19:03:01 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0
      Nov  5 19:03:01 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1
      Nov  5 19:03:01 oss-3-0 kernel: Call Trace:
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      Nov  5 19:03:01 oss-3-0 kernel: Mem-Info:
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA per-cpu:
      Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32 per-cpu:
      Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal per-cpu:
      Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd: 135
      Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal per-cpu:
      Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:  34
      Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: active_anon:4820 inactive_anon:4483 isolated_anon:0
      Nov  5 19:03:01 oss-3-0 kernel: active_file:5926 inactive_file:8941 isolated_file:0
      Nov  5 19:03:01 oss-3-0 kernel: unevictable:3806 dirty:153 writeback:0 unstable:0
      Nov  5 19:03:01 oss-3-0 kernel: free:55741 slab_reclaimable:49595 slab_unreclaimable:4005232
      Nov  5 19:03:01 oss-3-0 kernel: mapped:4159 shmem:59 pagetables:991 bounce:0
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir
      ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32 free:121244kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc
      ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18560kB slab_unreclaimable:2218348kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
      Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal free:40852kB min:40888kB low:51108kB high:61332kB active_anon:1740kB inactive_anon:1028kB active_file:7732kB inactive_file:10852kB unevictable:13992kB isolated(anon):0kB isolated(file):0kB pr
      esent:29992960kB mlocked:5820kB dirty:640kB writeback:0kB mapped:7480kB shmem:116kB slab_reclaimable:135120kB slab_unreclaimable:7843632kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all
      _unreclaimable? no
      Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
      Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal free:45180kB min:45120kB low:56400kB high:67680kB active_anon:17540kB inactive_anon:16904kB active_file:15972kB inactive_file:24912kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB
      present:33095680kB mlocked:1232kB dirty:0kB writeback:0kB mapped:9156kB shmem:120kB slab_reclaimable:44700kB slab_unreclaimable:5958948kB kernel_stack:2608kB pagetables:3196kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all
      _unreclaimable? no
      Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 10*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121244kB
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal: 1413*4kB 764*8kB 348*16kB 190*32kB 98*64kB 33*128kB 19*256kB 4*512kB 1*1024kB 0*2048kB 0*4096kB = 41844kB
      Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal: 2525*4kB 1493*8kB 597*16kB 209*32kB 47*64kB 13*128kB 3*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 45772kB
      Nov  5 19:03:01 oss-3-0 kernel: 15551 total pagecache pages
      Nov  5 19:03:01 oss-3-0 kernel: 0 pages in swap cache
      Nov  5 19:03:01 oss-3-0 kernel: Swap cache stats: add 0, delete 0, find 0/0
      Nov  5 19:03:01 oss-3-0 kernel: Free swap  = 33005564kB
      Nov  5 19:03:01 oss-3-0 kernel: Total swap = 33005564kB
      Nov  5 19:03:01 oss-3-0 kernel: 16777215 pages RAM
      Nov  5 19:03:01 oss-3-0 kernel: 310798 pages reserved
      Nov  5 19:03:01 oss-3-0 kernel: 13266 pages shared
      Nov  5 19:03:01 oss-3-0 kernel: 12503778 pages non-shared
      Nov  5 19:03:01 oss-3-0 kernel: LNetError: 2957:0:(o2iblnd.c:870:kiblnd_create_conn()) Can't create QP: -12, send_wr: 8224, recv_wr: 34
      Nov  5 19:03:01 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0
      Nov  5 19:03:01 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1
      Nov  5 19:03:01 oss-3-0 kernel: Call Trace:
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81160cea>] ? alloc_vmap_area+0x27a/0x390
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0181ac8>] ? mlx5_debug_cq_add+0x48/0x60 [mlx5_core]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm]
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
      Nov  5 19:03:01 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      Nov  5 19:03:01 oss-3-0 kernel: Mem-Info:
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA per-cpu:
      Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32 per-cpu:
      Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal per-cpu:
      Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:  23
      Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal per-cpu:
      Nov  5 19:03:01 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:01 oss-3-0 kernel: active_anon:4149 inactive_anon:4131 isolated_anon:0
      Nov  5 19:03:01 oss-3-0 kernel: active_file:2501 inactive_file:2188 isolated_file:0
      Nov  5 19:03:01 oss-3-0 kernel: unevictable:3807 dirty:90 writeback:0 unstable:0
      Nov  5 19:03:01 oss-3-0 kernel: free:56462 slab_reclaimable:48868 slab_unreclaimable:4008137
      Nov  5 19:03:01 oss-3-0 kernel: mapped:3400 shmem:29 pagetables:931 bounce:0
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir
      ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32 free:121276kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc
      ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18540kB slab_unreclaimable:2218360kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal free:41476kB min:40888kB low:51108kB high:61332kB active_anon:0kB inactive_anon:0kB active_file:5136kB inactive_file:3892kB unevictable:13996kB isolated(anon):0kB isolated(file):0kB present:2
      9992960kB mlocked:5820kB dirty:312kB writeback:0kB mapped:7444kB shmem:0kB slab_reclaimable:134716kB slab_unreclaimable:7844724kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclai
      mable? no
      Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
      Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal free:47408kB min:45120kB low:56400kB high:67680kB active_anon:16596kB inactive_anon:16524kB active_file:4868kB inactive_file:4860kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB pr
      esent:33095680kB mlocked:1232kB dirty:48kB writeback:0kB mapped:6156kB shmem:116kB slab_reclaimable:42216kB slab_unreclaimable:5969464kB kernel_stack:2624kB pagetables:2956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_
      unreclaimable? no
      Nov  5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 11*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121276kB
      Nov  5 19:03:01 oss-3-0 kernel: Node 0 Normal: 1238*4kB 864*8kB 496*16kB 217*32kB 95*64kB 28*128kB 12*256kB 1*512kB 2*1024kB 0*2048kB 0*4096kB = 42040kB
      Nov  5 19:03:01 oss-3-0 kernel: Node 1 Normal: 1780*4kB 1147*8kB 637*16kB 299*32kB 112*64kB 27*128kB 3*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 48472kB
      Nov  5 19:03:01 oss-3-0 kernel: 5791 total pagecache pages
      Nov  5 19:03:01 oss-3-0 kernel: 38 pages in swap cache
      Nov  5 19:03:01 oss-3-0 kernel: Swap cache stats: add 1104, delete 1066, find 0/1
      Nov  5 19:03:01 oss-3-0 kernel: Free swap  = 33001180kB
      Nov  5 19:03:01 oss-3-0 kernel: Total swap = 33005564kB
      Nov  5 19:03:02 oss-3-0 kernel: 16777215 pages RAM
      Nov  5 19:03:02 oss-3-0 kernel: 310798 pages reserved
      Nov  5 19:03:02 oss-3-0 kernel: 11573 pages shared
      Nov  5 19:03:02 oss-3-0 kernel: 12507708 pages non-shared
      Nov  5 19:03:02 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0
      Nov  5 19:03:02 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1
      Nov  5 19:03:02 oss-3-0 kernel: Call Trace:
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81160cea>] ? alloc_vmap_area+0x27a/0x390
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0181ac8>] ? mlx5_debug_cq_add+0x48/0x60 [mlx5_core]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm]
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0
      Nov  5 19:03:02 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20
      Nov  5 19:03:02 oss-3-0 kernel: Mem-Info:
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA per-cpu:
      Nov  5 19:03:02 oss-3-0 kernel: CPU    0: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    1: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    2: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    3: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    4: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    5: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    6: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    7: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    8: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    9: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   10: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   11: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   12: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   13: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   14: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   15: hi:    0, btch:   1 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA32 per-cpu:
      Nov  5 19:03:02 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 Normal per-cpu:
      Nov  5 19:03:02 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: Node 1 Normal per-cpu:
      Nov  5 19:03:02 oss-3-0 kernel: CPU    0: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    1: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    2: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    3: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    4: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    5: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    6: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    7: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    8: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU    9: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   10: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   11: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   12: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   13: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   14: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: CPU   15: hi:  186, btch:  31 usd:   0
      Nov  5 19:03:02 oss-3-0 kernel: active_anon:3235 inactive_anon:3163 isolated_anon:0
      Nov  5 19:03:02 oss-3-0 kernel: active_file:2501 inactive_file:2191 isolated_file:0
      Nov  5 19:03:02 oss-3-0 kernel: unevictable:3807 dirty:90 writeback:2632 unstable:0
      Nov  5 19:03:02 oss-3-0 kernel: free:57038 slab_reclaimable:48868 slab_unreclaimable:4009033
      Nov  5 19:03:02 oss-3-0 kernel: mapped:3401 shmem:29 pagetables:931 bounce:0
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir
      ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      Nov  5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA32 free:121276kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc
      ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18540kB slab_unreclaimable:2218360kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      Nov  5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 Normal free:40836kB min:40888kB low:51108kB high:61332kB active_anon:0kB inactive_anon:0kB active_file:5136kB inactive_file:3892kB unevictable:13996kB isolated(anon):0kB isolated(file):0kB present:2
      9992960kB mlocked:5820kB dirty:312kB writeback:0kB mapped:7444kB shmem:0kB slab_reclaimable:134716kB slab_unreclaimable:7844728kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:240 all_unrecl
      aimable? no
      Nov  5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
      Nov  5 19:03:02 oss-3-0 kernel: Node 1 Normal free:50352kB min:45120kB low:56400kB high:67680kB active_anon:12940kB inactive_anon:12652kB active_file:4868kB inactive_file:4872kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB pr
      esent:33095680kB mlocked:1232kB dirty:48kB writeback:10528kB mapped:6160kB shmem:116kB slab_reclaimable:42216kB slab_unreclaimable:5973044kB kernel_stack:2608kB pagetables:2956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
      all_unreclaimable? no
      Nov  5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 11*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121276kB
      Nov  5 19:03:02 oss-3-0 kernel: Node 0 Normal: 1244*4kB 853*8kB 492*16kB 216*32kB 95*64kB 27*128kB 9*256kB 2*512kB 1*1024kB 0*2048kB 0*4096kB = 40472kB
      Nov  5 19:03:02 oss-3-0 kernel: Node 1 Normal: 1542*4kB 1019*8kB 595*16kB 334*32kB 145*64kB 37*128kB 7*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50336kB
      Nov  5 19:03:02 oss-3-0 kernel: 8057 total pagecache pages
      Nov  5 19:03:02 oss-3-0 kernel: 2542 pages in swap cache
      Nov  5 19:03:02 oss-3-0 kernel: Swap cache stats: add 5785, delete 3243, find 20/26
      Nov  5 19:03:02 oss-3-0 kernel: Free swap  = 32982620kB
      Nov  5 19:03:02 oss-3-0 kernel: Total swap = 33005564kB
      Nov  5 19:03:02 oss-3-0 kernel: 16777215 pages RAM
      Nov  5 19:03:02 oss-3-0 kernel: 310798 pages reserved
      Nov  5 19:03:02 oss-3-0 kernel: 11578 pages shared
      Nov  5 19:03:02 oss-3-0 kernel: 12507859 pages non-shared
      

      Attachments

        Issue Links

          Activity

            [LU-7401] OOM after LNet initialization with not default peer_creadits on mlx5

            I can see how http://review.whamcloud.com/14600 addresses the reconnect issue for the clients, but I am not convinced that this patch will help the passive side, the server, out here. If you have 10,000 clients competing to reconnect, the server can still be hammered into OOM even when the clients clean up zombies before reattempting.

            doug Doug Oucharek (Inactive) added a comment - I can see how http://review.whamcloud.com/14600 addresses the reconnect issue for the clients, but I am not convinced that this patch will help the passive side, the server, out here. If you have 10,000 clients competing to reconnect, the server can still be hammered into OOM even when the clients clean up zombies before reattempting.
            jfilizetti Jeremy Filizetti added a comment - - edited

            If LU-3322 is affecting this then I'm guessing that you are stuck in a tight loop of kiblnd_reject from the server side and kiblnd_rejected->kiblnd_reconnect on the client side. The values for ibc_queue_depth and ibc_max_frags were not being preserved so every connection attempt was rejected for the same reason and attempted to be reconnected. It might make sense to put a counter in kib_peer_t to break or throttle the reconnect loop even though this particular symptom has been fixed.

            Looks like http://review.whamcloud.com/14600 handles this for the conn race condition (LU-5718)

            jfilizetti Jeremy Filizetti added a comment - - edited If LU-3322 is affecting this then I'm guessing that you are stuck in a tight loop of kiblnd_reject from the server side and kiblnd_rejected->kiblnd_reconnect on the client side. The values for ibc_queue_depth and ibc_max_frags were not being preserved so every connection attempt was rejected for the same reason and attempted to be reconnected. It might make sense to put a counter in kib_peer_t to break or throttle the reconnect loop even though this particular symptom has been fixed. Looks like http://review.whamcloud.com/14600 handles this for the conn race condition ( LU-5718 )
            pjones Peter Jones added a comment -

            ok so it makes sense to duplicate this into LU-3322 then

            pjones Peter Jones added a comment - ok so it makes sense to duplicate this into LU-3322 then

            It looks this is an issue of incomplete path for LU-3322. With patch http://review.whamcloud.com/#/c/17074/2 this OOM don't happens any more.

            dmiter Dmitry Eremin (Inactive) added a comment - It looks this is an issue of incomplete path for LU-3322 . With patch http://review.whamcloud.com/#/c/17074/2 this OOM don't happens any more.

            This happened just after LNet initialization for selftest usage. There are no Lustre usage at all. Maybe there is just spontaneous ping, not more. The process rdma_cm begin to allocate the memory and it's constantly grows until OOM.

            dmiter Dmitry Eremin (Inactive) added a comment - This happened just after LNet initialization for selftest usage. There are no Lustre usage at all. Maybe there is just spontaneous ping, not more. The process rdma_cm begin to allocate the memory and it's constantly grows until OOM.

            Is this on a client, server, or LNet router? How many connections are being made to that node?

            doug Doug Oucharek (Inactive) added a comment - Is this on a client, server, or LNet router? How many connections are being made to that node?

            People

              wc-triage WC Triage
              dmiter Dmitry Eremin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: