[LU-7401] OOM after LNet initialization with not default peer_creadits on mlx5 Created: 06/Nov/15 Updated: 21/Dec/15 Resolved: 09/Nov/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Dmitry Eremin (Inactive) | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Lustre: Build Version: 2.7.62-g8248c89-CHANGED-2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 |
||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
# ibv_devinfo -v
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 12.12.1100
node_guid: e41d:2d03:005f:48e6
sys_image_guid: e41d:2d03:005f:48e6
vendor_id: 0x02c9
vendor_part_id: 4115
hw_ver: 0x0
board_id: MT_2180110032
phys_port_cnt: 1
max_mr_size: 0xffffffffffffffff
page_size_cap: 0xfffff000
max_qp: 262144
max_qp_wr: 32768
device_cap_flags: 0x40509c36
BAD_PKEY_CNTR
BAD_QKEY_CNTR
AUTO_PATH_MIG
CHANGE_PHY_PORT
PORT_ACTIVE_EVENT
SYS_IMAGE_GUID
RC_RNR_NAK_GEN
XRC
Unknown flags: 0x40408000
device_cap_exp_flags: 0x5060007100000000
EXP_DC_TRANSPORT
EXP_MEM_MGT_EXTENSIONS
EXP_CROSS_CHANNEL
EXP_MR_ALLOCATE
EXT_ATOMICS
EXT_SEND NOP
EXP_UMR
EXP_DC_INFO
max_sge: 30
max_sge_rd: 0
max_cq: 16777216
max_cqe: 4194303
max_mr: 16777216
max_pd: 16777216
max_qp_rd_atom: 16
max_ee_rd_atom: 0
max_res_rd_atom: 4194304
max_qp_init_rd_atom: 16
max_ee_init_rd_atom: 0
atomic_cap: ATOMIC_HCA_REPLY_BE (64)
log atomic arg sizes (mask) 3c
max fetch and add bit boundary 64
log max atomic inline 5
max_ee: 0
max_rdd: 0
max_mw: 0
max_raw_ipv6_qp: 0
max_raw_ethy_qp: 0
max_mcast_grp: 2097152
max_mcast_qp_attach: 48
max_total_mcast_qp_attach: 100663296
max_ah: 2147483647
max_fmr: 0
max_srq: 8388608
max_srq_wr: 32767
max_srq_sge: 31
max_pkeys: 128
local_ca_ack_delay: 16
hca_core_clock: 0
max_klm_list_size: 65536
max_send_wqe_inline_klms: 20
max_umr_recursion_depth: 4
max_umr_stride_dimension: 1
general_odp_caps:
rc_odp_caps:
NO SUPPORT
uc_odp_caps:
NO SUPPORT
ud_odp_caps:
NO SUPPORT
dc_odp_caps:
NO SUPPORT
xrc_odp_caps:
NO SUPPORT
raw_eth_odp_caps:
NO SUPPORT
max_dct: 262144
max_device_ctx: 1020
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 19
port_lid: 48
port_lmc: 0x00
link_layer: InfiniBand
max_msg_sz: 0x40000000
port_cap_flags: 0x2651e848
max_vl_num: 4 (3)
bad_pkey_cntr: 0x0
qkey_viol_cntr: 0x0
sm_sl: 0
pkey_tbl_len: 128
gid_tbl_len: 8
subnet_timeout: 18
init_type_reply: 0
active_width: 4X (2)
active_speed: 25.0 Gbps (32)
phys_state: LINK_UP (5)
GID[ 0]: fe80:0000:0000:0000:e41d:2d03:005f:48e6
The following lnd tunables are used:
After few minutest after LNet initialization OOM happens with following messages: Nov 5 19:01:43 oss-3-0 kernel: Lustre: Lustre: Build Version: 2.7.62-g8248c89-CHANGED-2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 Nov 5 19:01:43 oss-3-0 kernel: LNet: Added LNI 192.168.3.104@o2ib [16/2560/0/180] Nov 5 19:03:01 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0 Nov 5 19:03:01 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1 Nov 5 19:03:01 oss-3-0 kernel: Call Trace: Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Nov 5 19:03:01 oss-3-0 kernel: Mem-Info: Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA per-cpu: Nov 5 19:03:01 oss-3-0 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 12: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 14: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 15: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA32 per-cpu: Nov 5 19:03:01 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: Node 0 Normal per-cpu: Nov 5 19:03:01 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 135 Nov 5 19:03:01 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: Node 1 Normal per-cpu: Nov 5 19:03:01 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 34 Nov 5 19:03:01 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: active_anon:4820 inactive_anon:4483 isolated_anon:0 Nov 5 19:03:01 oss-3-0 kernel: active_file:5926 inactive_file:8941 isolated_file:0 Nov 5 19:03:01 oss-3-0 kernel: unevictable:3806 dirty:153 writeback:0 unstable:0 Nov 5 19:03:01 oss-3-0 kernel: free:55741 slab_reclaimable:49595 slab_unreclaimable:4005232 Nov 5 19:03:01 oss-3-0 kernel: mapped:4159 shmem:59 pagetables:991 bounce:0 Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Nov 5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211 Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA32 free:121244kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18560kB slab_unreclaimable:2218348kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Nov 5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290 Nov 5 19:03:01 oss-3-0 kernel: Node 0 Normal free:40852kB min:40888kB low:51108kB high:61332kB active_anon:1740kB inactive_anon:1028kB active_file:7732kB inactive_file:10852kB unevictable:13992kB isolated(anon):0kB isolated(file):0kB pr esent:29992960kB mlocked:5820kB dirty:640kB writeback:0kB mapped:7480kB shmem:116kB slab_reclaimable:135120kB slab_unreclaimable:7843632kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all _unreclaimable? no Nov 5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0 Nov 5 19:03:01 oss-3-0 kernel: Node 1 Normal free:45180kB min:45120kB low:56400kB high:67680kB active_anon:17540kB inactive_anon:16904kB active_file:15972kB inactive_file:24912kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB present:33095680kB mlocked:1232kB dirty:0kB writeback:0kB mapped:9156kB shmem:120kB slab_reclaimable:44700kB slab_unreclaimable:5958948kB kernel_stack:2608kB pagetables:3196kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all _unreclaimable? no Nov 5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0 Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 10*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121244kB Nov 5 19:03:01 oss-3-0 kernel: Node 0 Normal: 1413*4kB 764*8kB 348*16kB 190*32kB 98*64kB 33*128kB 19*256kB 4*512kB 1*1024kB 0*2048kB 0*4096kB = 41844kB Nov 5 19:03:01 oss-3-0 kernel: Node 1 Normal: 2525*4kB 1493*8kB 597*16kB 209*32kB 47*64kB 13*128kB 3*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 45772kB Nov 5 19:03:01 oss-3-0 kernel: 15551 total pagecache pages Nov 5 19:03:01 oss-3-0 kernel: 0 pages in swap cache Nov 5 19:03:01 oss-3-0 kernel: Swap cache stats: add 0, delete 0, find 0/0 Nov 5 19:03:01 oss-3-0 kernel: Free swap = 33005564kB Nov 5 19:03:01 oss-3-0 kernel: Total swap = 33005564kB Nov 5 19:03:01 oss-3-0 kernel: 16777215 pages RAM Nov 5 19:03:01 oss-3-0 kernel: 310798 pages reserved Nov 5 19:03:01 oss-3-0 kernel: 13266 pages shared Nov 5 19:03:01 oss-3-0 kernel: 12503778 pages non-shared Nov 5 19:03:01 oss-3-0 kernel: LNetError: 2957:0:(o2iblnd.c:870:kiblnd_create_conn()) Can't create QP: -12, send_wr: 8224, recv_wr: 34 Nov 5 19:03:01 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0 Nov 5 19:03:01 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1 Nov 5 19:03:01 oss-3-0 kernel: Call Trace: Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81160cea>] ? alloc_vmap_area+0x27a/0x390 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0181ac8>] ? mlx5_debug_cq_add+0x48/0x60 [mlx5_core] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm] Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0 Nov 5 19:03:01 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Nov 5 19:03:01 oss-3-0 kernel: Mem-Info: Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA per-cpu: Nov 5 19:03:01 oss-3-0 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 12: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 14: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 15: hi: 0, btch: 1 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA32 per-cpu: Nov 5 19:03:01 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: Node 0 Normal per-cpu: Nov 5 19:03:01 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 23 Nov 5 19:03:01 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: Node 1 Normal per-cpu: Nov 5 19:03:01 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:01 oss-3-0 kernel: active_anon:4149 inactive_anon:4131 isolated_anon:0 Nov 5 19:03:01 oss-3-0 kernel: active_file:2501 inactive_file:2188 isolated_file:0 Nov 5 19:03:01 oss-3-0 kernel: unevictable:3807 dirty:90 writeback:0 unstable:0 Nov 5 19:03:01 oss-3-0 kernel: free:56462 slab_reclaimable:48868 slab_unreclaimable:4008137 Nov 5 19:03:01 oss-3-0 kernel: mapped:3400 shmem:29 pagetables:931 bounce:0 Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Nov 5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211 Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA32 free:121276kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18540kB slab_unreclaimable:2218360kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Nov 5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290 Nov 5 19:03:01 oss-3-0 kernel: Node 0 Normal free:41476kB min:40888kB low:51108kB high:61332kB active_anon:0kB inactive_anon:0kB active_file:5136kB inactive_file:3892kB unevictable:13996kB isolated(anon):0kB isolated(file):0kB present:2 9992960kB mlocked:5820kB dirty:312kB writeback:0kB mapped:7444kB shmem:0kB slab_reclaimable:134716kB slab_unreclaimable:7844724kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclai mable? no Nov 5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0 Nov 5 19:03:01 oss-3-0 kernel: Node 1 Normal free:47408kB min:45120kB low:56400kB high:67680kB active_anon:16596kB inactive_anon:16524kB active_file:4868kB inactive_file:4860kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB pr esent:33095680kB mlocked:1232kB dirty:48kB writeback:0kB mapped:6156kB shmem:116kB slab_reclaimable:42216kB slab_unreclaimable:5969464kB kernel_stack:2624kB pagetables:2956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_ unreclaimable? no Nov 5 19:03:01 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0 Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB Nov 5 19:03:01 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 11*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121276kB Nov 5 19:03:01 oss-3-0 kernel: Node 0 Normal: 1238*4kB 864*8kB 496*16kB 217*32kB 95*64kB 28*128kB 12*256kB 1*512kB 2*1024kB 0*2048kB 0*4096kB = 42040kB Nov 5 19:03:01 oss-3-0 kernel: Node 1 Normal: 1780*4kB 1147*8kB 637*16kB 299*32kB 112*64kB 27*128kB 3*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 48472kB Nov 5 19:03:01 oss-3-0 kernel: 5791 total pagecache pages Nov 5 19:03:01 oss-3-0 kernel: 38 pages in swap cache Nov 5 19:03:01 oss-3-0 kernel: Swap cache stats: add 1104, delete 1066, find 0/1 Nov 5 19:03:01 oss-3-0 kernel: Free swap = 33001180kB Nov 5 19:03:01 oss-3-0 kernel: Total swap = 33005564kB Nov 5 19:03:02 oss-3-0 kernel: 16777215 pages RAM Nov 5 19:03:02 oss-3-0 kernel: 310798 pages reserved Nov 5 19:03:02 oss-3-0 kernel: 11573 pages shared Nov 5 19:03:02 oss-3-0 kernel: 12507708 pages non-shared Nov 5 19:03:02 oss-3-0 kernel: rdma_cm: page allocation failure. order:8, mode:0xd0 Nov 5 19:03:02 oss-3-0 kernel: Pid: 2957, comm: rdma_cm Not tainted 2.6.32-573.7.1.el6_lustre.g95557d5.x86_64 #1 Nov 5 19:03:02 oss-3-0 kernel: Call Trace: Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff81137adc>] ? __alloc_pages_nodemask+0x7dc/0x950 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff81160cea>] ? alloc_vmap_area+0x27a/0x390 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff81177282>] ? kmem_getpages+0x62/0x170 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff81177e9a>] ? fallback_alloc+0x1ba/0x270 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff811778ef>] ? cache_grow+0x2cf/0x320 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff81177c19>] ? ____cache_alloc_node+0x99/0x160 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff81178869>] ? __kmalloc+0x199/0x230 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa02936f6>] ? create_kernel_qp+0x5a6/0x8b0 [mlx5_ib] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa0299432>] ? create_qp_common+0xb52/0x1240 [mlx5_ib] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa0181ac8>] ? mlx5_debug_cq_add+0x48/0x60 [mlx5_core] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa0299c1f>] ? __create_qp+0xff/0x4a0 [mlx5_ib] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff81178593>] ? kmem_cache_alloc_trace+0x1b3/0x1c0 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa029a0a4>] ? mlx5_ib_create_qp+0xd4/0x180 [mlx5_ib] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa025b400>] ? ib_create_qp+0x60/0x310 [ib_core] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa04307e8>] ? rdma_create_qp+0x48/0xc0 [rdma_cm] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa0a2e10f>] ? kiblnd_create_conn+0xa2f/0x15e0 [ko2iblnd] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa0a3c262>] ? kiblnd_cm_callback+0x1272/0x20f0 [ko2iblnd] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa043399c>] ? cma_work_handler+0x7c/0xb0 [rdma_cm] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffffa0433920>] ? cma_work_handler+0x0/0xb0 [rdma_cm] Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff8109a780>] ? worker_thread+0x170/0x2a0 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff810a1460>] ? autoremove_wake_function+0x0/0x40 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff8109a610>] ? worker_thread+0x0/0x2a0 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff810a0fce>] ? kthread+0x9e/0xc0 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff810a0f30>] ? kthread+0x0/0xc0 Nov 5 19:03:02 oss-3-0 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20 Nov 5 19:03:02 oss-3-0 kernel: Mem-Info: Nov 5 19:03:02 oss-3-0 kernel: Node 0 DMA per-cpu: Nov 5 19:03:02 oss-3-0 kernel: CPU 0: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 1: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 2: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 3: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 4: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 5: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 6: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 7: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 8: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 9: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 10: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 11: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 12: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 13: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 14: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 15: hi: 0, btch: 1 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: Node 0 DMA32 per-cpu: Nov 5 19:03:02 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: Node 0 Normal per-cpu: Nov 5 19:03:02 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: Node 1 Normal per-cpu: Nov 5 19:03:02 oss-3-0 kernel: CPU 0: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 1: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 2: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 3: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 4: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 5: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 6: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 7: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 8: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 9: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 10: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 11: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 12: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 13: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 14: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: CPU 15: hi: 186, btch: 31 usd: 0 Nov 5 19:03:02 oss-3-0 kernel: active_anon:3235 inactive_anon:3163 isolated_anon:0 Nov 5 19:03:02 oss-3-0 kernel: active_file:2501 inactive_file:2191 isolated_file:0 Nov 5 19:03:02 oss-3-0 kernel: unevictable:3807 dirty:90 writeback:2632 unstable:0 Nov 5 19:03:02 oss-3-0 kernel: free:57038 slab_reclaimable:48868 slab_unreclaimable:4009033 Nov 5 19:03:02 oss-3-0 kernel: mapped:3401 shmem:29 pagetables:931 bounce:0 Nov 5 19:03:02 oss-3-0 kernel: Node 0 DMA free:15688kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15244kB mlocked:0kB dir ty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Nov 5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 2921 32211 32211 Nov 5 19:03:02 oss-3-0 kernel: Node 0 DMA32 free:121276kB min:4076kB low:5092kB high:6112kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2991928kB mloc ked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:18540kB slab_unreclaimable:2218360kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Nov 5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 29290 29290 Nov 5 19:03:02 oss-3-0 kernel: Node 0 Normal free:40836kB min:40888kB low:51108kB high:61332kB active_anon:0kB inactive_anon:0kB active_file:5136kB inactive_file:3892kB unevictable:13996kB isolated(anon):0kB isolated(file):0kB present:2 9992960kB mlocked:5820kB dirty:312kB writeback:0kB mapped:7444kB shmem:0kB slab_reclaimable:134716kB slab_unreclaimable:7844728kB kernel_stack:6192kB pagetables:768kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:240 all_unrecl aimable? no Nov 5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0 Nov 5 19:03:02 oss-3-0 kernel: Node 1 Normal free:50352kB min:45120kB low:56400kB high:67680kB active_anon:12940kB inactive_anon:12652kB active_file:4868kB inactive_file:4872kB unevictable:1232kB isolated(anon):0kB isolated(file):0kB pr esent:33095680kB mlocked:1232kB dirty:48kB writeback:10528kB mapped:6160kB shmem:116kB slab_reclaimable:42216kB slab_unreclaimable:5973044kB kernel_stack:2608kB pagetables:2956kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Nov 5 19:03:02 oss-3-0 kernel: lowmem_reserve[]: 0 0 0 0 Nov 5 19:03:02 oss-3-0 kernel: Node 0 DMA: 0*4kB 1*8kB 0*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15688kB Nov 5 19:03:02 oss-3-0 kernel: Node 0 DMA32: 17*4kB 133*8kB 11*16kB 11*32kB 13*64kB 6*128kB 7*256kB 9*512kB 11*1024kB 7*2048kB 21*4096kB = 121276kB Nov 5 19:03:02 oss-3-0 kernel: Node 0 Normal: 1244*4kB 853*8kB 492*16kB 216*32kB 95*64kB 27*128kB 9*256kB 2*512kB 1*1024kB 0*2048kB 0*4096kB = 40472kB Nov 5 19:03:02 oss-3-0 kernel: Node 1 Normal: 1542*4kB 1019*8kB 595*16kB 334*32kB 145*64kB 37*128kB 7*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 50336kB Nov 5 19:03:02 oss-3-0 kernel: 8057 total pagecache pages Nov 5 19:03:02 oss-3-0 kernel: 2542 pages in swap cache Nov 5 19:03:02 oss-3-0 kernel: Swap cache stats: add 5785, delete 3243, find 20/26 Nov 5 19:03:02 oss-3-0 kernel: Free swap = 32982620kB Nov 5 19:03:02 oss-3-0 kernel: Total swap = 33005564kB Nov 5 19:03:02 oss-3-0 kernel: 16777215 pages RAM Nov 5 19:03:02 oss-3-0 kernel: 310798 pages reserved Nov 5 19:03:02 oss-3-0 kernel: 11578 pages shared Nov 5 19:03:02 oss-3-0 kernel: 12507859 pages non-shared |
| Comments |
| Comment by Doug Oucharek (Inactive) [ 06/Nov/15 ] |
|
Is this on a client, server, or LNet router? How many connections are being made to that node? |
| Comment by Dmitry Eremin (Inactive) [ 09/Nov/15 ] |
|
This happened just after LNet initialization for selftest usage. There are no Lustre usage at all. Maybe there is just spontaneous ping, not more. The process rdma_cm begin to allocate the memory and it's constantly grows until OOM. |
| Comment by Dmitry Eremin (Inactive) [ 09/Nov/15 ] |
|
It looks this is an issue of incomplete path for |
| Comment by Peter Jones [ 09/Nov/15 ] |
|
ok so it makes sense to duplicate this into |
| Comment by Jeremy Filizetti [ 09/Nov/15 ] |
|
If Looks like http://review.whamcloud.com/14600 handles this for the conn race condition ( |
| Comment by Doug Oucharek (Inactive) [ 10/Nov/15 ] |
|
I can see how http://review.whamcloud.com/14600 addresses the reconnect issue for the clients, but I am not convinced that this patch will help the passive side, the server, out here. If you have 10,000 clients competing to reconnect, the server can still be hammered into OOM even when the clients clean up zombies before reattempting. |