Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3322

ko2iblnd support for different map_on_demand and peer_credits between systems

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.8.0
    • Lustre 2.7.0, Lustre 2.8.0
    • 3
    • 20,543
    • 8528

    Description

      ko2iblnd currently doesn't support different values of peer_credits or map_on_demand between systems.

      After I finish some testing I will upload a patch to gerrit in the next couple of days.

      Attachments

        Issue Links

          Activity

            [LU-3322] ko2iblnd support for different map_on_demand and peer_credits between systems

            http://review.whamcloud.com/17074/ has landed for 2.8.
            Resolving the ticket as noted in the commentary above.

            jgmitter Joseph Gmitter (Inactive) added a comment - http://review.whamcloud.com/17074/ has landed for 2.8. Resolving the ticket as noted in the commentary above.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17074/
            Subject: LU-3322 lnet: make connect parameters persistent
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4c689a573fafcfa1ca7474a275f958e00b1deddc

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17074/ Subject: LU-3322 lnet: make connect parameters persistent Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4c689a573fafcfa1ca7474a275f958e00b1deddc

            This ticket seems to have expanded to be a catch-all for anything related to map_on_demand/peer_credit settings. I'd rather see this ticket be used for its original purpose, what Jeremy describes above. Anything new should become a new ticket (or set of tickets) so we don't get confused and link a bunch of new tickets to this one believing these patches will solve all issues in this area.

            Once patch http://review.whamcloud.com/#/c/17074/ has landed, I'd like this ticket to be closed. If there are any more problems with optimized settings for specific hardware setups, please open separate tickets so they can be prioritized and addressed accordingly.

            doug Doug Oucharek (Inactive) added a comment - This ticket seems to have expanded to be a catch-all for anything related to map_on_demand/peer_credit settings. I'd rather see this ticket be used for its original purpose, what Jeremy describes above. Anything new should become a new ticket (or set of tickets) so we don't get confused and link a bunch of new tickets to this one believing these patches will solve all issues in this area. Once patch http://review.whamcloud.com/#/c/17074/ has landed, I'd like this ticket to be closed. If there are any more problems with optimized settings for specific hardware setups, please open separate tickets so they can be prioritized and addressed accordingly.

            I don't know anything about the ko2iblnd-opa, I've always used a custom module parameter file for Lustre. Now that you've included the link I see this is now being included with Lustre which I wasn't aware of before. From what I can see you should be ok to use those parameters with those adapters but I'm not sure they are the "ideal" settings.

            jfilizetti Jeremy Filizetti added a comment - I don't know anything about the ko2iblnd-opa, I've always used a custom module parameter file for Lustre. Now that you've included the link I see this is now being included with Lustre which I wasn't aware of before. From what I can see you should be ok to use those parameters with those adapters but I'm not sure they are the "ideal" settings.

            Hi Jeremy,
            Thanks for the explanation at least it shows the challenges involved. From your comments the default ko2iblnd-opa parameters (ie. LU-6735) should work for ConnectX[123] and Truescale adapters.
            Our hardware, we have max work requests per QP (max_qp_wr) vaules of 16351, 16383 or 16384.

            chunteraa Chris Hunter (Inactive) added a comment - Hi Jeremy, Thanks for the explanation at least it shows the challenges involved. From your comments the default ko2iblnd-opa parameters (ie. LU-6735 ) should work for ConnectX [123] and Truescale adapters. Our hardware, we have max work requests per QP (max_qp_wr) vaules of 16351, 16383 or 16384.

            I only used stock centos 6 kernels for the initial work. Only after noting that MLX5 memory registration does not support FMR have I even started looking at and testing Mellanox ofed. There are not any issues with this patch and map_on_demand settings that I'm aware of even though they seem to be getting reported here as such. The problem is that it requires too much low level understanding of the driver to configure and ko2iblnd does not abstract the differing hardware well enough at this point. My goal with this patch was to allow interop with systems configured for IB WAN performance and those that may come from a vendor solution with different parameters. Given the current memory registration upstream changes and lack of flexibility, ko2iblnd really needs some additional work to make things more robust and support multiple configurations. This patch only serves as a stop-gap for that larger necessary work. The best that really can be done is to make recommendations to people based on their needs here.

            jfilizetti Jeremy Filizetti added a comment - I only used stock centos 6 kernels for the initial work. Only after noting that MLX5 memory registration does not support FMR have I even started looking at and testing Mellanox ofed. There are not any issues with this patch and map_on_demand settings that I'm aware of even though they seem to be getting reported here as such. The problem is that it requires too much low level understanding of the driver to configure and ko2iblnd does not abstract the differing hardware well enough at this point. My goal with this patch was to allow interop with systems configured for IB WAN performance and those that may come from a vendor solution with different parameters. Given the current memory registration upstream changes and lack of flexibility, ko2iblnd really needs some additional work to make things more robust and support multiple configurations. This patch only serves as a stop-gap for that larger necessary work. The best that really can be done is to make recommendations to people based on their needs here.

            People

              ashehata Amir Shehata (Inactive)
              jfilizetti Jeremy Filizetti
              Votes:
              0 Vote for this issue
              Watchers:
              31 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: