Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.8.0
-
RHEL7.2 derivative: 3.10.0-327.13.1.3chaos.ch6.x86_64 #1 SMP Wed May 11 18:38:20 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux
lustre-2.8.0_0.0.llnlpreview.13-1.ch6.x86_64
router has two interfaces, omnipath on compute side:
05:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4]
81:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 10)RHEL7.2 derivative: 3.10.0-327.13.1.3chaos.ch6.x86_64 #1 SMP Wed May 11 18:38:20 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux lustre-2.8.0_0.0.llnlpreview.13-1.ch6.x86_64 router has two interfaces, omnipath on compute side: 05:00.0 Infiniband controller: Mellanox Technologies MT27700 Family [ConnectX-4] 81:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 10)
-
3
-
9223372036854775807
Description
On a router node with both omnipath and mellanox interfaces, I see the following in the output of journalctl -xe:
-- Unit lnet.service has begun starting up. kernel: LNet: Added LNI 192.168.128.187@o2ib18 [128/8192/0/180] kernel: fmr_pool: Device mlx5_0 does not support FMRs kernel: LNetError: 7963:0:(o2iblnd.c:1459:kiblnd_create_fmr_pool()) Failed to create FMR pool: -38 kernel: LNetError: 7963:0:(o2iblnd.c:2096:kiblnd_net_init_pools()) Can't initialize FMR pool for CPT 0: -38 kernel: LNetError: 7963:0:(o2iblnd.c:2895:kiblnd_startup()) Failed to initialize NI pools: -38 kernel: LNetError: 105-4: Error -100 starting up LNI o2ib kernel: LNetError: 801:0:(o2iblnd_cb.c:2297:kiblnd_passive_connect()) Can't accept conn from 192.168.128.37@o2ibkernel: LNetError: 801:0:(o2iblnd_cb.c:2297:kiblnd_passive_connect()) Skipped 20 previous similar messages kernel: LNet: Removed LNI 192.168.128.187@o2ib18 lnet[7960]: LNET configure error 100: Network is down systemd[1]: lnet.service: control process exited, code=exited status=1 systemd[1]: Failed to start SYSV: Part of the lustre file system..
I do not encounter this on the compute nodes, which have only omnipath, nor on the lustre servers, which have only mellanox.
Lustre 2.8 ships with /etc/modprobe.d/ko2iblnd.conf, which contains:
alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1