Details

    • Improvement
    • Resolution: Unresolved
    • Critical
    • None
    • None
    • None
    • 9223372036854775807

    Description

      The solution to LU-3322 has made changes to how we can configure map_on_demand and peer_credits for o2iblnd-based connections. In that ticket, Jeremy has outlined some good information which should be captured in our documentation to help users better understand how to work with three tuning parameters: map_on_demand, peer_credits, and concurrent_sends.

      We should also be documenting what good pre-sets to use for different node types: MDS/OSS, LNet routers, and clients.

      Attachments

        Issue Links

          Activity

            [LUDOC-286] Document the effects of LU-3322

            The attached spreadsheet lists 4th case "OPN = Map-on-demand->on" but likely should read "OPN = Map-on-demand-> off"

            chunteraa Chris Hunter (Inactive) added a comment - The attached spreadsheet lists 4th case "OPN = Map-on-demand->on" but likely should read "OPN = Map-on-demand-> off"

            This patch has landed to master, but as yet there is no documentation patch for the manual.

            Comments from Jeremy in LU-3322:

            Summary:
            Normally the Lustre ko2iblnd can only operate with identical peer_credits and map_on_demand between systems. The patch affects the active (initiator) connection in IB for clients/routers and the passive (responder) connections for IB with servers/routers. Passive connections will automatically negotiate down if the parameters permit and reject if they are to high for the remote request. Active connections will send their defaults initially and if rejected attempt to use the lower values from the reject message. If the values are higher the active connection won't retry because its not supported.

            There are 3 parameters in the ko2iblnd of interest here: peer_credits, map_on_demand, and concurrent_sends. The default settings for ko2iblnd are peer_credits=8, concurrent_sends=8, and map_on_demand=0 (disabled)

            peer_credits determines how many messages you can receive from a single connection (queue pair).
            map_on_demand determines the number of DMA segments per credit that are sent. Each segment is usually (always?) a page and is sent as a separate work request so these are typically (256 * 4k pages) going across the wire.
            concurrent_sends determines how many send messages you can queue at a time to a single connection, concurrent_sends can't be less than half of peer_credits but concurrent_sends needs to be <= 62 to not exceed the maximum number of work requests per queue pair in the standard Mellanox ConnectX[123] HCAs.

            The relation between those values is: work_requests_allocated = (map_on_demand + 1) * concurrent_sends

            You can see your max work requests per queue pair with:

                ibv_devinfo -v | grep max_qp_wr
                max_qp_wr: 16384
            

            My recommendation is the following for maximum compatibility.
            Lustre MDS/OSS Severs and Routers, use the patch and run with:

            peer_credits=124 concurrent_sends=62 map_on_demand=256
            

            Existing Lustre clients:
            Need to apply the patch or lower the current values for peer_credits and concurrent_sends to match the OSS/MDS setting.

            Do we need to include some or all of Amir's compatibility matrix (https://jira.hpdd.intel.com/secure/attachment/18893/compatibility%20matrix.xlsx) in the manual to tell users what options exist for configuring with mixed-Lustre-version networks?

            The manual section related to this should be marked conditional for Lustre 2.8 using the <para condition="l28"> tag.

            adilger Andreas Dilger added a comment - This patch has landed to master, but as yet there is no documentation patch for the manual. Comments from Jeremy in LU-3322 : Summary: Normally the Lustre ko2iblnd can only operate with identical peer_credits and map_on_demand between systems. The patch affects the active (initiator) connection in IB for clients/routers and the passive (responder) connections for IB with servers/routers. Passive connections will automatically negotiate down if the parameters permit and reject if they are to high for the remote request. Active connections will send their defaults initially and if rejected attempt to use the lower values from the reject message. If the values are higher the active connection won't retry because its not supported. There are 3 parameters in the ko2iblnd of interest here: peer_credits, map_on_demand, and concurrent_sends. The default settings for ko2iblnd are peer_credits=8 , concurrent_sends=8 , and map_on_demand=0 (disabled) peer_credits determines how many messages you can receive from a single connection (queue pair). map_on_demand determines the number of DMA segments per credit that are sent. Each segment is usually (always?) a page and is sent as a separate work request so these are typically (256 * 4k pages) going across the wire. concurrent_sends determines how many send messages you can queue at a time to a single connection, concurrent_sends can't be less than half of peer_credits but concurrent_sends needs to be <= 62 to not exceed the maximum number of work requests per queue pair in the standard Mellanox ConnectX[123] HCAs. The relation between those values is: work_requests_allocated = (map_on_demand + 1) * concurrent_sends You can see your max work requests per queue pair with: ibv_devinfo -v | grep max_qp_wr max_qp_wr: 16384 My recommendation is the following for maximum compatibility. Lustre MDS/OSS Severs and Routers, use the patch and run with: peer_credits=124 concurrent_sends=62 map_on_demand=256 Existing Lustre clients: Need to apply the patch or lower the current values for peer_credits and concurrent_sends to match the OSS/MDS setting. Do we need to include some or all of Amir's compatibility matrix ( https://jira.hpdd.intel.com/secure/attachment/18893/compatibility%20matrix.xlsx ) in the manual to tell users what options exist for configuring with mixed-Lustre-version networks? The manual section related to this should be marked conditional for Lustre 2.8 using the <para condition="l28"> tag.

            People

              LM-Triage Lustre Manual Triage
              doug Doug Oucharek (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: