[LU-12756] Refactor lnet_select_pathway and lnet_peer_ni ref counting Created: 12/Sep/19  Updated: 06/Jun/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Unresolved Votes: 0
Labels: None

Rank (Obsolete): 9223372036854775807

 Description   

lnet_select_pathway() contains a fair amount of duplicated code. I took a stab at refactoring this function and I believe I have a functionally equivalent implementation with about half as much code.

During that process I noted that the lnet_nid2peerni_locked() function takes a ref on the lnet_peer_ni
object that it returns, however most callers of this function do not need this reference as they hold a net lock when referencing the lnet_peer_ni. So some additonal code removal is possible here as well.



 Comments   
Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36532
Subject: LU-12756 lnet: Drop ref on peer_ni in lnet_nid2peerni_locked()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ca309687426db6f79464378470a1ae04a0431195

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36543
Subject: LU-12756 lnet: Refactor lnet_select_pathway
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e50177e2eecfcc55072b1a5f645776280f4d655c

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36542
Subject: LU-12756 lnet: Remove dst_nid arg from lnet_select_pathway
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: c794838727bbb7904e9b7ab28bdd13c54c62c1d9

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36540
Subject: LU-12756 lnet: Remove unnecessary rtr_nid argument
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a5731b4c5f226bce7c3471a702f4ce257070bccb

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36533
Subject: LU-12756 lnet: Drop addref in lnet_get_peer_ni_locked
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 394ffa7e3ee6a8c708057f007266db96e57b17c6

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36534
Subject: LU-12756 lnet: Refactor lnet_find_best_lpni_on_net
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8b5308d20b03bce2b3e5a7b95f733101379a80c3

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36536
Subject: LU-12756 lnet: Avoid extra lnet_remotenet lookup
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fb13cc6a34a4546ba11b214ce1050f0f130e0bcd

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36535
Subject: LU-12756 lnet: Avoid comparing route to itself
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bee08d4a17bcd79f52fe8b9d0f444391a24c1a7b

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36539
Subject: LU-12756 lnet: Introduce lnet_msg_is_response
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: fb5ff0f336eaf5c3632f50825f52dd577e13c23f

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36538
Subject: LU-12756 lnet: Refactor lnet_find_existing_preferred_best_ni
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7ccee40584c7f9e913811659a684189de2c60867

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36537
Subject: LU-12756 lnet: Refactor lnet_set_non_mr_pref_nid
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b3e15d49e268e4b98e59d8cc7f371cd7106d4507

Comment by Gerrit Updater [ 22/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36541
Subject: LU-12756 lnet: Use info cached in lnet_msg on resend
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0aa2e94b8ea0a2aba75b5aa701183dd248895660

Comment by Gerrit Updater [ 29/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36608
Subject: LU-12756 lnet: Commit for testing
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a9c12029976e06cafbae5b31f415711338ef3394

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36618
Subject: LU-12756 lnet: Fix src spec route selection
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b948db7a062ae76e7cdb82bf326c2bc941e50518

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36621
Subject: LU-12756 lnet: Refactor lnet_compare_routes
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e75253aad064e857d4d1d97290d4896b9e189680

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36623
Subject: LU-12756 lnet: lnet_find_peer_ni_locked called without lock
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b47e3c45fbed17a32c08faf6ffed2a5b2a8c323d

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36620
Subject: LU-12756 lnet: Remove unused vars in lnet_find_route_locked
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 699d3fdf1c1004e4fc895562b9ef9261516b0901

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36628
Subject: LU-12756 lnet: Restrict lnet_select_pathway to path selection
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7680ada26a581590ef1a6a690761af86a9df8f8f

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36625
Subject: LU-12756 lnet: Refactor lnet_find_route_locked
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2f15148b938d5a98a94ac37dce17b57ba74be7eb

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36626
Subject: LU-12756 lnet: Switch lnet_handle_lo_send to void function
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 788b4bd153e2fc7b67a27ca1b2fd51f5a8136651

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36627
Subject: LU-12756 lnet: Refactor lnet_handle_lo_send
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 30a4416bffcf7830203344d6938e1efab16d92fe

Comment by Gerrit Updater [ 31/Oct/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/36624
Subject: LU-12756 lnet: Take peer NI ref in lnet_discover_peer_locked
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 148184da0238d4c533c4d3e9a6ecaebaa8efa635

Comment by Chris Horn [ 01/Nov/19 ]

FYI, I plan the following test activities for this patch series.

Port patch series to Cray's Lustre 2.12. All patches apply cleanly because our 2.12 LNet is basically identical to master.

Deploy the code to dev system with Clusterstor server and small XC system (40 compute nodes, 4 LNet routers).

Place load on system (IOR, mdtest, etc.).

Perform LNet router failure testing over 24 hours. This involves: NMI a router, wait 15 minutes, reboot router, wait 15 minutes, repeat.

Expectation is that there are no client evictions and no test failures.

I will perform the above testing for the following configurations:
1. Test code on servers, routers and clients with default settings (Edit: Passed)
2. Test code on servers, routers and clients with discovery disabled on clients and routers (Edit: Passed)
3. Test code on servers, routers and clients with discovery disabled on servers. (Edit: Test fails due to https://jira.whamcloud.com/browse/LU-12955)
4. Test code on servers, routers and clients with discovery disabled on clients servers and routers (Edit: Passed).
5. Test code on servers, clients and multi-rail routers with default settings.

Comment by Alexey Lyashkov [ 26/Dec/19 ]

@Chris - my vote to add some performance testing against that series.
As I see, it have touched a sort of hot code. So I think we need to add performance evaluation as similar as we have with MR routing testing.
workload is same for all tests - mdtest + IOR with small record size (4k-8k).
1) new code on server
2) new code on routers also -
3) all nodes have same code.

Comment by Gerrit Updater [ 10/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36534/
Subject: LU-12756 lnet: Refactor lnet_find_best_lpni_on_net
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 80edb2ad72baf9b096e03f2929b8b018b0a630d2

Comment by Gerrit Updater [ 10/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36535/
Subject: LU-12756 lnet: Avoid comparing route to itself
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 2b8d9d12d182fc91d671558434cc0b652c1ade21

Comment by Gerrit Updater [ 18/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36536/
Subject: LU-12756 lnet: Avoid extra lnet_remotenet lookup
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3812c54b9ca3cc08be947f893bdf55a41aa876ed

Comment by Gerrit Updater [ 18/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36620/
Subject: LU-12756 lnet: Remove unused vars in lnet_find_route_locked
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b129f7b1f76a716c656627beb271de58e6af473c

Comment by Gerrit Updater [ 18/Jan/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36621/
Subject: LU-12756 lnet: Refactor lnet_compare_routes
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e02287b4ef6a30e9e23a84012d8a133621ea454e

Comment by Gerrit Updater [ 14/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36537/
Subject: LU-12756 lnet: Refactor lnet_set_non_mr_pref_nid
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ce442253a719a065bf85e0202546a3afd4a38524

Comment by Gerrit Updater [ 14/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36538/
Subject: LU-12756 lnet: Refactor lnet_find_existing_preferred_best_ni
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ef7c4021c03c6b2b300b9075bf60d2be7d66784a

Comment by Gerrit Updater [ 14/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36539/
Subject: LU-12756 lnet: Introduce lnet_msg_is_response
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ef20f5e84e5457397da5b0e086c42ce6b79e2574

Comment by Gerrit Updater [ 14/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36540/
Subject: LU-12756 lnet: Remove unnecessary rtr_nid argument
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 01286ba81ce93e05e5e61c8346c80c74deb29d32

Comment by Gerrit Updater [ 14/Feb/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36540/
Subject: LU-12756 lnet: Remove unnecessary rtr_nid argument
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 357fa4a84f4bb0a0b54f176ab8e2d4f59be80bb6

Comment by Gerrit Updater [ 11/Mar/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36541/
Subject: LU-12756 lnet: Use info cached in lnet_msg on resend
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 67fafd215587141c92b358f8800c05ef1f8088a4

Comment by Gerrit Updater [ 06/Jun/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/36623/
Subject: LU-12756 lnet: Avoid redundant peer NI lookups
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: b00ac5f7038434a339b235445c260f439a409b49

Generated at Sat Feb 10 02:55:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.