Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12756

Refactor lnet_select_pathway and lnet_peer_ni ref counting

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      lnet_select_pathway() contains a fair amount of duplicated code. I took a stab at refactoring this function and I believe I have a functionally equivalent implementation with about half as much code.

      During that process I noted that the lnet_nid2peerni_locked() function takes a ref on the lnet_peer_ni
      object that it returns, however most callers of this function do not need this reference as they hold a net lock when referencing the lnet_peer_ni. So some additonal code removal is possible here as well.

      Attachments

        Activity

          [LU-12756] Refactor lnet_select_pathway and lnet_peer_ni ref counting

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36537/
          Subject: LU-12756 lnet: Refactor lnet_set_non_mr_pref_nid
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: ce442253a719a065bf85e0202546a3afd4a38524

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36537/ Subject: LU-12756 lnet: Refactor lnet_set_non_mr_pref_nid Project: fs/lustre-release Branch: master Current Patch Set: Commit: ce442253a719a065bf85e0202546a3afd4a38524

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36621/
          Subject: LU-12756 lnet: Refactor lnet_compare_routes
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: e02287b4ef6a30e9e23a84012d8a133621ea454e

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36621/ Subject: LU-12756 lnet: Refactor lnet_compare_routes Project: fs/lustre-release Branch: master Current Patch Set: Commit: e02287b4ef6a30e9e23a84012d8a133621ea454e

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36620/
          Subject: LU-12756 lnet: Remove unused vars in lnet_find_route_locked
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: b129f7b1f76a716c656627beb271de58e6af473c

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36620/ Subject: LU-12756 lnet: Remove unused vars in lnet_find_route_locked Project: fs/lustre-release Branch: master Current Patch Set: Commit: b129f7b1f76a716c656627beb271de58e6af473c

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36536/
          Subject: LU-12756 lnet: Avoid extra lnet_remotenet lookup
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 3812c54b9ca3cc08be947f893bdf55a41aa876ed

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36536/ Subject: LU-12756 lnet: Avoid extra lnet_remotenet lookup Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3812c54b9ca3cc08be947f893bdf55a41aa876ed

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36535/
          Subject: LU-12756 lnet: Avoid comparing route to itself
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 2b8d9d12d182fc91d671558434cc0b652c1ade21

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36535/ Subject: LU-12756 lnet: Avoid comparing route to itself Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2b8d9d12d182fc91d671558434cc0b652c1ade21

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36534/
          Subject: LU-12756 lnet: Refactor lnet_find_best_lpni_on_net
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 80edb2ad72baf9b096e03f2929b8b018b0a630d2

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36534/ Subject: LU-12756 lnet: Refactor lnet_find_best_lpni_on_net Project: fs/lustre-release Branch: master Current Patch Set: Commit: 80edb2ad72baf9b096e03f2929b8b018b0a630d2

          @Chris - my vote to add some performance testing against that series.
          As I see, it have touched a sort of hot code. So I think we need to add performance evaluation as similar as we have with MR routing testing.
          workload is same for all tests - mdtest + IOR with small record size (4k-8k).
          1) new code on server
          2) new code on routers also -
          3) all nodes have same code.

          shadow Alexey Lyashkov added a comment - @Chris - my vote to add some performance testing against that series. As I see, it have touched a sort of hot code. So I think we need to add performance evaluation as similar as we have with MR routing testing. workload is same for all tests - mdtest + IOR with small record size (4k-8k). 1) new code on server 2) new code on routers also - 3) all nodes have same code.
          hornc Chris Horn added a comment - - edited

          FYI, I plan the following test activities for this patch series.

          Port patch series to Cray's Lustre 2.12. All patches apply cleanly because our 2.12 LNet is basically identical to master.

          Deploy the code to dev system with Clusterstor server and small XC system (40 compute nodes, 4 LNet routers).

          Place load on system (IOR, mdtest, etc.).

          Perform LNet router failure testing over 24 hours. This involves: NMI a router, wait 15 minutes, reboot router, wait 15 minutes, repeat.

          Expectation is that there are no client evictions and no test failures.

          I will perform the above testing for the following configurations:
          1. Test code on servers, routers and clients with default settings (Edit: Passed)
          2. Test code on servers, routers and clients with discovery disabled on clients and routers (Edit: Passed)
          3. Test code on servers, routers and clients with discovery disabled on servers. (Edit: Test fails due to https://jira.whamcloud.com/browse/LU-12955)
          4. Test code on servers, routers and clients with discovery disabled on clients servers and routers (Edit: Passed).
          5. Test code on servers, clients and multi-rail routers with default settings.

          hornc Chris Horn added a comment - - edited FYI, I plan the following test activities for this patch series. Port patch series to Cray's Lustre 2.12. All patches apply cleanly because our 2.12 LNet is basically identical to master. Deploy the code to dev system with Clusterstor server and small XC system (40 compute nodes, 4 LNet routers). Place load on system (IOR, mdtest, etc.). Perform LNet router failure testing over 24 hours. This involves: NMI a router, wait 15 minutes, reboot router, wait 15 minutes, repeat. Expectation is that there are no client evictions and no test failures. I will perform the above testing for the following configurations: 1. Test code on servers, routers and clients with default settings (Edit: Passed) 2. Test code on servers, routers and clients with discovery disabled on clients and routers (Edit: Passed) 3. Test code on servers, routers and clients with discovery disabled on servers. (Edit: Test fails due to https://jira.whamcloud.com/browse/LU-12955 ) 4. Test code on servers, routers and clients with discovery disabled on clients servers and routers (Edit: Passed). 5. Test code on servers, clients and multi-rail routers with default settings.

          Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36624
          Subject: LU-12756 lnet: Take peer NI ref in lnet_discover_peer_locked
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 148184da0238d4c533c4d3e9a6ecaebaa8efa635

          gerrit Gerrit Updater added a comment - Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36624 Subject: LU-12756 lnet: Take peer NI ref in lnet_discover_peer_locked Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 148184da0238d4c533c4d3e9a6ecaebaa8efa635

          Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36627
          Subject: LU-12756 lnet: Refactor lnet_handle_lo_send
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 30a4416bffcf7830203344d6938e1efab16d92fe

          gerrit Gerrit Updater added a comment - Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36627 Subject: LU-12756 lnet: Refactor lnet_handle_lo_send Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 30a4416bffcf7830203344d6938e1efab16d92fe

          Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36626
          Subject: LU-12756 lnet: Switch lnet_handle_lo_send to void function
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 788b4bd153e2fc7b67a27ca1b2fd51f5a8136651

          gerrit Gerrit Updater added a comment - Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36626 Subject: LU-12756 lnet: Switch lnet_handle_lo_send to void function Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 788b4bd153e2fc7b67a27ca1b2fd51f5a8136651

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: