[LU-8225] router node: Failed to create FMR pool: -38 Created: 01/Jun/16 Updated: 01/Jun/16 Resolved: 01/Jun/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.8.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Olaf Faaland | Assignee: | Dmitry Eremin (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
RHEL7.2 derivative: 3.10.0-327.13.1.3chaos.ch6.x86_64 #1 SMP Wed May 11 18:38:20 PDT 2016 x86_64 x86_64 x86_64 GNU/Linux |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
On a router node with both omnipath and mellanox interfaces, I see the following in the output of journalctl -xe: -- Unit lnet.service has begun starting up. kernel: LNet: Added LNI 192.168.128.187@o2ib18 [128/8192/0/180] kernel: fmr_pool: Device mlx5_0 does not support FMRs kernel: LNetError: 7963:0:(o2iblnd.c:1459:kiblnd_create_fmr_pool()) Failed to create FMR pool: -38 kernel: LNetError: 7963:0:(o2iblnd.c:2096:kiblnd_net_init_pools()) Can't initialize FMR pool for CPT 0: -38 kernel: LNetError: 7963:0:(o2iblnd.c:2895:kiblnd_startup()) Failed to initialize NI pools: -38 kernel: LNetError: 105-4: Error -100 starting up LNI o2ib kernel: LNetError: 801:0:(o2iblnd_cb.c:2297:kiblnd_passive_connect()) Can't accept conn from 192.168.128.37@o2ibkernel: LNetError: 801:0:(o2iblnd_cb.c:2297:kiblnd_passive_connect()) Skipped 20 previous similar messages kernel: LNet: Removed LNI 192.168.128.187@o2ib18 lnet[7960]: LNET configure error 100: Network is down systemd[1]: lnet.service: control process exited, code=exited status=1 systemd[1]: Failed to start SYSV: Part of the lustre file system.. I do not encounter this on the compute nodes, which have only omnipath, nor on the lustre servers, which have only mellanox. Lustre 2.8 ships with /etc/modprobe.d/ko2iblnd.conf, which contains: alias ko2iblnd-opa ko2iblnd options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 concurrent_sends=256 ntx=2048 map_on_demand=32 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1 |
| Comments |
| Comment by Olaf Faaland [ 01/Jun/16 ] |
|
This did not come up with lustre 2.5; it's new with lustre 2.8. |
| Comment by Olaf Faaland [ 01/Jun/16 ] |
|
Note this occurs when attempt to start lnet. Lnet fails to start as a result. |
| Comment by Olaf Faaland [ 01/Jun/16 ] |
|
Removing /etc/modprobe.d/ko2iblnd.conf allows lnet to start successfully. lctl pings from client->server and server->client (through the router) then work as expected. |
| Comment by Peter Jones [ 01/Jun/16 ] |
|
Olaf It seems that this is a duplicate of Dmitry Please can you provide any further advise LLNL need on this topic Thanks Peter |
| Comment by Olaf Faaland [ 01/Jun/16 ] |
|
Peter, |