[LU-7101] Lnet: Support per NI map-on-demand Created: 04/Sep/15 Updated: 30/Jan/17 Resolved: 11/Apr/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Amir Shehata (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
MLX5 does not support FMR. With the removal of PMR ( To do so the following solution is devised. FMR is enabled by setting map-on-demand to: 0 < value <=256. This represents a problem for coexistence nodes with both OPA and MLX5. OPA performance is greatly enhanced by setting map-on-demand, however if map-on-demand is set for MLX5, then it will not work. We need to be able to support per-NI map-on-demand value; therefore when OPA net is configured it can use optimal map-on-demand value, but when MLX5 is configured map-on-demand can be set to 0, disabling it. However, this raises another issue which is different map-on-demand values across fabrics. LNet currently doesn't support this. However The proposed solution is consistent of three patches. Future patch will add support for dynamic setting of map-on-demand, but that's a future feature not required to address the immediate need. |
| Comments |
| Comment by Gerrit Updater [ 11/Sep/15 ] |
|
Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/16367 |
| Comment by James A Simmons [ 11/Sep/15 ] |
|
I just tested this patch set and got it to work. For this set of test I didn't use map_on_demand at all. For the mlx5 driver map_on_demand will not work. It always gives the follow error no matter what value I set to map_on_demand: LNetError: 8048:0:(o2iblnd.c:2242:kiblnd_net_init_pools()) Can't set fmr pool size (512) < ntx / 4(1280) My config string is: options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=63 concurrent_sends=63 What I did find that work with the mlx5 driver is: options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=16 concurrent_sends=16 On the server side I'm still using the config string: options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=63 concurrent_sends=63 I haven't tried map_on_demand on the server side yet. Any suggestions to bump up the peer_credits? |
| Comment by Jeremy Filizetti [ 24/Sep/15 ] |
|
In lustre's current o2iblnd LND map_on_demand != 0 is equivalent to enabling FMR. Since mlx5 doesn't support FMR it will fail. The error you are seeing is because fmr_pool_size defaults to 512 but even if you change it to a larger value though it should then fail in kiblnd_create_fmr_pool on the call to ib_create_fmr_pool. |
| Comment by James A Simmons [ 06/Nov/15 ] |
|
Latest patch update has a new look for lnetctl net show -v net:
|
| Comment by James A Simmons [ 16/Dec/15 ] |
|
Updated the patch to support setting FMR pool parameters as well. The patch is flexible enough to allow different settings on different IB ports on the same node. See the output of lnetctl net show -v net:
|
| Comment by Gerrit Updater [ 07/Apr/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16367/ |
| Comment by Joseph Gmitter (Inactive) [ 11/Apr/16 ] |
|
Landed to master for 2.9.0 |