Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3322

ko2iblnd support for different map_on_demand and peer_credits between systems

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.7.0, Lustre 2.8.0
    • 3
    • 20,543
    • 8528

    Description

      ko2iblnd currently doesn't support different values of peer_credits or map_on_demand between systems.

      After I finish some testing I will upload a patch to gerrit in the next couple of days.

      Attachments

        Issue Links

          Activity

            [LU-3322] ko2iblnd support for different map_on_demand and peer_credits between systems

            http://review.whamcloud.com/17074/ has landed for 2.8.
            Resolving the ticket as noted in the commentary above.

            jgmitter Joseph Gmitter (Inactive) added a comment - http://review.whamcloud.com/17074/ has landed for 2.8. Resolving the ticket as noted in the commentary above.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17074/
            Subject: LU-3322 lnet: make connect parameters persistent
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4c689a573fafcfa1ca7474a275f958e00b1deddc

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17074/ Subject: LU-3322 lnet: make connect parameters persistent Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4c689a573fafcfa1ca7474a275f958e00b1deddc

            This ticket seems to have expanded to be a catch-all for anything related to map_on_demand/peer_credit settings. I'd rather see this ticket be used for its original purpose, what Jeremy describes above. Anything new should become a new ticket (or set of tickets) so we don't get confused and link a bunch of new tickets to this one believing these patches will solve all issues in this area.

            Once patch http://review.whamcloud.com/#/c/17074/ has landed, I'd like this ticket to be closed. If there are any more problems with optimized settings for specific hardware setups, please open separate tickets so they can be prioritized and addressed accordingly.

            doug Doug Oucharek (Inactive) added a comment - This ticket seems to have expanded to be a catch-all for anything related to map_on_demand/peer_credit settings. I'd rather see this ticket be used for its original purpose, what Jeremy describes above. Anything new should become a new ticket (or set of tickets) so we don't get confused and link a bunch of new tickets to this one believing these patches will solve all issues in this area. Once patch http://review.whamcloud.com/#/c/17074/ has landed, I'd like this ticket to be closed. If there are any more problems with optimized settings for specific hardware setups, please open separate tickets so they can be prioritized and addressed accordingly.

            I don't know anything about the ko2iblnd-opa, I've always used a custom module parameter file for Lustre. Now that you've included the link I see this is now being included with Lustre which I wasn't aware of before. From what I can see you should be ok to use those parameters with those adapters but I'm not sure they are the "ideal" settings.

            jfilizetti Jeremy Filizetti added a comment - I don't know anything about the ko2iblnd-opa, I've always used a custom module parameter file for Lustre. Now that you've included the link I see this is now being included with Lustre which I wasn't aware of before. From what I can see you should be ok to use those parameters with those adapters but I'm not sure they are the "ideal" settings.

            Hi Jeremy,
            Thanks for the explanation at least it shows the challenges involved. From your comments the default ko2iblnd-opa parameters (ie. LU-6735) should work for ConnectX[123] and Truescale adapters.
            Our hardware, we have max work requests per QP (max_qp_wr) vaules of 16351, 16383 or 16384.

            chunteraa Chris Hunter (Inactive) added a comment - Hi Jeremy, Thanks for the explanation at least it shows the challenges involved. From your comments the default ko2iblnd-opa parameters (ie. LU-6735 ) should work for ConnectX [123] and Truescale adapters. Our hardware, we have max work requests per QP (max_qp_wr) vaules of 16351, 16383 or 16384.

            I only used stock centos 6 kernels for the initial work. Only after noting that MLX5 memory registration does not support FMR have I even started looking at and testing Mellanox ofed. There are not any issues with this patch and map_on_demand settings that I'm aware of even though they seem to be getting reported here as such. The problem is that it requires too much low level understanding of the driver to configure and ko2iblnd does not abstract the differing hardware well enough at this point. My goal with this patch was to allow interop with systems configured for IB WAN performance and those that may come from a vendor solution with different parameters. Given the current memory registration upstream changes and lack of flexibility, ko2iblnd really needs some additional work to make things more robust and support multiple configurations. This patch only serves as a stop-gap for that larger necessary work. The best that really can be done is to make recommendations to people based on their needs here.

            jfilizetti Jeremy Filizetti added a comment - I only used stock centos 6 kernels for the initial work. Only after noting that MLX5 memory registration does not support FMR have I even started looking at and testing Mellanox ofed. There are not any issues with this patch and map_on_demand settings that I'm aware of even though they seem to be getting reported here as such. The problem is that it requires too much low level understanding of the driver to configure and ko2iblnd does not abstract the differing hardware well enough at this point. My goal with this patch was to allow interop with systems configured for IB WAN performance and those that may come from a vendor solution with different parameters. Given the current memory registration upstream changes and lack of flexibility, ko2iblnd really needs some additional work to make things more robust and support multiple configurations. This patch only serves as a stop-gap for that larger necessary work. The best that really can be done is to make recommendations to people based on their needs here.

            There are different bundles of the OFED/infiniband kernel drivers available from different sources eg) openfabrics.org OFED, mellanox OFED (MOFED), Truescale OFED+ and RHEL rdma. All these packages seem to tweak the kernel drivers (eg. different versions, additional code, etc.) To further confuse the list, there a multiple versions of the packages (eg. MOFED 2.x, 3.x).

            Which IB driver packaging are you using for testing the ko2iblnd patches ? stock OFED ? MOFED ?

            thanks,
            chris hunter

            chunteraa Chris Hunter (Inactive) added a comment - There are different bundles of the OFED/infiniband kernel drivers available from different sources eg) openfabrics.org OFED, mellanox OFED (MOFED), Truescale OFED+ and RHEL rdma. All these packages seem to tweak the kernel drivers (eg. different versions, additional code, etc.) To further confuse the list, there a multiple versions of the packages (eg. MOFED 2.x, 3.x). Which IB driver packaging are you using for testing the ko2iblnd patches ? stock OFED ? MOFED ? thanks, chris hunter

            The following parameters are compatible with mlx5 and works fine for OPA:

            options ko2iblnd-opa peer_credits=62 peer_credits_hiw=64 credits=1024 concurrent_sends=62 ntx=2048 map_on_demand=256 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1
            
            dmiter Dmitry Eremin (Inactive) added a comment - The following parameters are compatible with mlx5 and works fine for OPA: options ko2iblnd-opa peer_credits=62 peer_credits_hiw=64 credits=1024 concurrent_sends=62 ntx=2048 map_on_demand=256 fmr_pool_size=2048 fmr_flush_trigger=512 fmr_cache=1

            FWIW 18446744073709551503 = -113 assuming 2's complement. On x86 Linux, think that's
            also -EHOSTUNREACH.

            olaf Olaf Weber (Inactive) added a comment - FWIW 18446744073709551503 = -113 assuming 2's complement. On x86 Linux, think that's also -EHOSTUNREACH.

            If you are still using your previous configuration you could modify things to make everything work.

            1. Server with mlx5 doesn't support FMR so map_on_demand should remain 0.
            2. Router can support FMR with the mlx4 (not sure about hfi) so you could use map_on_demand=256. You will need to drop your concurrent_sends to <=62 I believe for the QP to not fail creation for the mlx4.
            3. Client with hfi I don't know if it supports FMR. But you could use defaults (map_on_demand=0) if not.

            With all that I think your configuration will be able to connect to everything. At least from what I quickly glanced over with respect to the map_on_demand settings.

            jfilizetti Jeremy Filizetti added a comment - If you are still using your previous configuration you could modify things to make everything work. 1. Server with mlx5 doesn't support FMR so map_on_demand should remain 0. 2. Router can support FMR with the mlx4 (not sure about hfi) so you could use map_on_demand=256. You will need to drop your concurrent_sends to <=62 I believe for the QP to not fail creation for the mlx4. 3. Client with hfi I don't know if it supports FMR. But you could use defaults (map_on_demand=0) if not. With all that I think your configuration will be able to connect to everything. At least from what I quickly glanced over with respect to the map_on_demand settings.

            People

              ashehata Amir Shehata (Inactive)
              jfilizetti Jeremy Filizetti
              Votes:
              0 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: