Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      MLX5 does not support FMR. With the removal of PMR (LU-6850) we need a way to support MLX5.

      To do so the following solution is devised. FMR is enabled by setting map-on-demand to: 0 < value <=256.

      This represents a problem for coexistence nodes with both OPA and MLX5. OPA performance is greatly enhanced by setting map-on-demand, however if map-on-demand is set for MLX5, then it will not work.

      We need to be able to support per-NI map-on-demand value; therefore when OPA net is configured it can use optimal map-on-demand value, but when MLX5 is configured map-on-demand can be set to 0, disabling it.

      However, this raises another issue which is different map-on-demand values across fabrics. LNet currently doesn't support this. However LU-3322 adds support for this scenario. This problem is relevant for both OPA support and for support of clusters which have both MLX5 and MLX4 interconnected.

      The proposed solution is consistent of three patches.
      1. LU-6850 patch which removed support for PMR
      2. LU-3322 support for different map-on-demand and peertxcredits values across the network.
      3. The patch for this LU, which adds support for per NI map-on-demand.

      Future patch will add support for dynamic setting of map-on-demand, but that's a future feature not required to address the immediate need.

      Attachments

        Issue Links

          Activity

            [LU-7101] Lnet: Support per NI map-on-demand

            Landed to master for 2.9.0

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed to master for 2.9.0

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16367/
            Subject: LU-7101 lnet: per NI map-on-demand value
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 94d1ce46562bd36c8959a9458cacabb7f6df681f

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16367/ Subject: LU-7101 lnet: per NI map-on-demand value Project: fs/lustre-release Branch: master Current Patch Set: Commit: 94d1ce46562bd36c8959a9458cacabb7f6df681f
            simmonsja James A Simmons added a comment - - edited

            Updated the patch to support setting FMR pool parameters as well. The patch is flexible enough to allow different settings on different IB ports on the same node. See the output of lnetctl net show -v
            This is big step forward in that we can now support different types of IIB hardware in the same node and configure each one independently.

            net:

            • net: lo
              nid: 0.0.0.0@lo
              status: up
              tunables:
              peer_timeout: 0
              peer_credits: 0
              peer_buffer_credits: 0
              credits: 0
            • net: o2ib1
              nid: pike11-ib0@o2ib1
              status: up
              interfaces:
              0: ib0
              tunables:
              peer_timeout: 100
              peer_credits: 16
              peer_buffer_credits: 0
              credits: 2560
              CPT: "[0,0]"
              LND tunables:
              peercredits_hiw: 8
              map_on_demand: 256
              concurrent_sends: 32
              fmr_pool_size: 1280
              fmr_flush_trigger: 1024
              fmr_cache: 1
            • net: o2ib2
              nid: pike12-ib1@o2ib2
              status: up
              interfaces:
              0: ib1
              tunables:
              peer_timeout: 180
              peer_credits: 16
              peer_buffer_credits: 0
              credits: 2560
              CPT: "[0,0,0,0]"
              LND tunables:
              peercredits_hiw: 8
              map_on_demand: 0
              concurrent_sends: 16
              fmr_pool_size: 512
              fmr_flush_trigger: 384
              fmr_cache: 1
            simmonsja James A Simmons added a comment - - edited Updated the patch to support setting FMR pool parameters as well. The patch is flexible enough to allow different settings on different IB ports on the same node. See the output of lnetctl net show -v This is big step forward in that we can now support different types of IIB hardware in the same node and configure each one independently. net: net: lo nid: 0.0.0.0@lo status: up tunables: peer_timeout: 0 peer_credits: 0 peer_buffer_credits: 0 credits: 0 net: o2ib1 nid: pike11-ib0@o2ib1 status: up interfaces: 0: ib0 tunables: peer_timeout: 100 peer_credits: 16 peer_buffer_credits: 0 credits: 2560 CPT: " [0,0] " LND tunables: peercredits_hiw: 8 map_on_demand: 256 concurrent_sends: 32 fmr_pool_size: 1280 fmr_flush_trigger: 1024 fmr_cache: 1 net: o2ib2 nid: pike12-ib1@o2ib2 status: up interfaces: 0: ib1 tunables: peer_timeout: 180 peer_credits: 16 peer_buffer_credits: 0 credits: 2560 CPT: " [0,0,0,0] " LND tunables: peercredits_hiw: 8 map_on_demand: 0 concurrent_sends: 16 fmr_pool_size: 512 fmr_flush_trigger: 384 fmr_cache: 1

            Latest patch update has a new look for lnetctl net show -v

            net:

            • net: lo
              nid: 0@lo
              status: up
              tunables:
              peer_timeout: 0
              peer_credits: 0
              peer_buffer_credits: 0
              credits: 0
              CPT: "[0,0]"
            • net: o2ib1
              nid: 10.37.248.19@o2ib1
              status: up
              interfaces:
              0: ib0
              tunables:
              peer_timeout: 180
              peer_buffer_credits: 0
              peer_credits: 63
              credits: 2560
              CPT: "[0,0]"
              LND tunables:
              use_privileged_port: 1
              require_privileged_port: 0
              dev_failover: 0
              fmr_cache: 1
              fmr_flush_trigger: 1024
              fmr_pool_size: 1280
              map_on_demand: 256
              concurrent_sends: 63
              ib_mtu: 0
              keepalive: 100
              rnr_retry_count: 6
              retry_count: 5
              ipif_name: ib0
              peer_credits_hiw: 31
              ntx: 5120
              nscheds: 0
              timeout: 100
              cksum: 0
              service: 987
            simmonsja James A Simmons added a comment - Latest patch update has a new look for lnetctl net show -v net: net: lo nid: 0@lo status: up tunables: peer_timeout: 0 peer_credits: 0 peer_buffer_credits: 0 credits: 0 CPT: " [0,0] " net: o2ib1 nid: 10.37.248.19@o2ib1 status: up interfaces: 0: ib0 tunables: peer_timeout: 180 peer_buffer_credits: 0 peer_credits: 63 credits: 2560 CPT: " [0,0] " LND tunables: use_privileged_port: 1 require_privileged_port: 0 dev_failover: 0 fmr_cache: 1 fmr_flush_trigger: 1024 fmr_pool_size: 1280 map_on_demand: 256 concurrent_sends: 63 ib_mtu: 0 keepalive: 100 rnr_retry_count: 6 retry_count: 5 ipif_name: ib0 peer_credits_hiw: 31 ntx: 5120 nscheds: 0 timeout: 100 cksum: 0 service: 987

            In lustre's current o2iblnd LND map_on_demand != 0 is equivalent to enabling FMR. Since mlx5 doesn't support FMR it will fail. The error you are seeing is because fmr_pool_size defaults to 512 but even if you change it to a larger value though it should then fail in kiblnd_create_fmr_pool on the call to ib_create_fmr_pool.

            jfilizetti Jeremy Filizetti added a comment - In lustre's current o2iblnd LND map_on_demand != 0 is equivalent to enabling FMR. Since mlx5 doesn't support FMR it will fail. The error you are seeing is because fmr_pool_size defaults to 512 but even if you change it to a larger value though it should then fail in kiblnd_create_fmr_pool on the call to ib_create_fmr_pool.

            I just tested this patch set and got it to work. For this set of test I didn't use map_on_demand at all. For the mlx5 driver map_on_demand will not work. It always gives the follow error no matter what value I set to map_on_demand:

            LNetError: 8048:0:(o2iblnd.c:2242:kiblnd_net_init_pools()) Can't set fmr pool size (512) < ntx / 4(1280)
            [ 3766.868871] LNetError: 8048:0:(o2iblnd.c:3110:kiblnd_startup()) Failed to initialize NI pools: -22

            My config string is:

            options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=63 concurrent_sends=63

            What I did find that work with the mlx5 driver is:

            options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=16 concurrent_sends=16

            On the server side I'm still using the config string:

            options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=63 concurrent_sends=63

            I haven't tried map_on_demand on the server side yet. Any suggestions to bump up the peer_credits?

            simmonsja James A Simmons added a comment - I just tested this patch set and got it to work. For this set of test I didn't use map_on_demand at all. For the mlx5 driver map_on_demand will not work. It always gives the follow error no matter what value I set to map_on_demand: LNetError: 8048:0:(o2iblnd.c:2242:kiblnd_net_init_pools()) Can't set fmr pool size (512) < ntx / 4(1280) [ 3766.868871] LNetError: 8048:0:(o2iblnd.c:3110:kiblnd_startup()) Failed to initialize NI pools: -22 My config string is: options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=63 concurrent_sends=63 What I did find that work with the mlx5 driver is: options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=16 concurrent_sends=16 On the server side I'm still using the config string: options ko2iblnd timeout=100 credits=2560 ntx=5120 peer_credits=63 concurrent_sends=63 I haven't tried map_on_demand on the server side yet. Any suggestions to bump up the peer_credits?

            Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/16367
            Subject: LU-7101 lnet: per NI map-on-demand value
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e77ed3e90197a749ba4d301ae5414731f861b5cd

            gerrit Gerrit Updater added a comment - Amir Shehata (amir.shehata@intel.com) uploaded a new patch: http://review.whamcloud.com/16367 Subject: LU-7101 lnet: per NI map-on-demand value Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e77ed3e90197a749ba4d301ae5414731f861b5cd

            People

              ashehata Amir Shehata (Inactive)
              ashehata Amir Shehata (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: