socklnd needs improved interface selection and configuration (LU-14064)

[LU-13641] socklnd: remove use_tcp_bonding option in favor of LNet Multi-Rail Created: 05/Jun/20  Updated: 17/Jun/23  Resolved: 13/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.12.4, Lustre 2.15.0
Fix Version/s: Lustre 2.16.0, Lustre 2.15.0

Type: Technical task Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: Serguei Smirnov
Resolution: Fixed Votes: 0
Labels: lnet

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

TCP bonding in socklnd over-complicates the code and there is no evidence it's being used anywhere. With LNet Multi-Rail, the use_tcp_bonding option has become obsolete. Add a deprecation message for earlier releases. Remove it in the 2.15 release.

Multi-Rail feature doesn't need to be explicitly enabled. To use MR instead of the use_tcp_bonding configuration option, group the interfaces on the same network using the lnetctl utility:

lnetctl net add --net tcp --if eth0,eth1

or via the /etc/modprobe.d/lnet.conf or /etc/modprobe.d/lustre.conf configuration file:

options lnet networks="tcp(eth0,eth1)"

and make sure dynamic discovery is enabled:

lnetctl set discovery 1

MR will aggregate the throughput of all available networks/interfaces shared between peer nodes.

See LNet Software Multi-Rail Configuration in the Lustre Operations Manual for more details.



 Comments   
Comment by Gerrit Updater [ 23/Sep/20 ]

Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40000
Subject: LU-13641 socklnd: remove tcp bonding
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 27ef0eee12f873f4c2a92868342d6c1b75df4c86

Comment by Gerrit Updater [ 27/Nov/20 ]

Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40774
Subject: LU-13641 socklnd: replace route construct
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d386df6d565cf0e60cae879f3348e61aeed9c0f8

Comment by Gerrit Updater [ 24/Dec/20 ]

Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41088
Subject: LU-13641 socklnd: announce deprecation of 'use_tcp_bonding'
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7d11a622e55a4a73a75513e8310d747696c7b422

Comment by Gerrit Updater [ 29/Dec/20 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41102
Subject: LU-13641 socklnd: announce deprecation of 'use_tcp_bonding'
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 193ed27d4dc9aad69a02e9e18f2c647e5b4942e0

Comment by Cory Spitz [ 04/Jan/21 ]

> TCP bonding in socklnd over-complicates the code and there is no evidence it's being used anywhere

It may not be widely used, but doesn't native TCP bonding outperform MR in various RAS situations? I suspect that there are real-world tests that show that TCP bonding seamlessly rides through failures whereas MR would need to re-try/re-transmit. Is this a wrong assumption? Is it proven that MR is better than bonding in any & all scenarios? If not, do you still want to deprecate bonding?

Comment by Serguei Smirnov [ 04/Jan/21 ]

Hi Cory,

The wording is a bit confusing, so I'll clarify that this ticket is dealing with just a socklnd feature, so one would still be able to use native TCP bonding in Linux with MR. It is the "socklnd bonding" that's being deprecated. Introduction of "socklnd bonding" allowed treating multiple interfaces as one in socklnd layer - introduction of MR brought the same concept to LNet layer. I don't believe there's a difference in performance. ashehata can correct me if my understanding is wrong.

Comment by Cory Spitz [ 04/Jan/21 ]

Ah, thanks for the clarification. That makes sense.

Comment by Gerrit Updater [ 05/Jan/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41088/
Subject: LU-13641 socklnd: announce deprecation of 'use_tcp_bonding'
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1a2bf911b9793648de3abbc88b9f77eb2237dc50

Comment by Peter Jones [ 05/Jan/21 ]

The deprecation warning has landed to 2.14. The removal itself is deferred to 2.15

Comment by Gerrit Updater [ 08/Jan/21 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41179
Subject: LU-13641 socklnd: remove tcp bonding
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 0124602e645e87e1e51e5b218a823f817e457547

Comment by Gerrit Updater [ 08/Jan/21 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/41180
Subject: LU-13641 socklnd: replace route construct
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 5364ae3875337160e5928631dd7484703789ea47

Comment by Gerrit Updater [ 04/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41102/
Subject: LU-13641 socklnd: announce deprecation of 'use_tcp_bonding'
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: ede08af7d30b2dc4c41b89db224ab1a3bdb2f30c

Comment by Gerrit Updater [ 30/Mar/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40000/
Subject: LU-13641 socklnd: remove tcp bonding
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d123c47a18adbf5665ed63d99c53117b84db9ec8

Comment by Gerrit Updater [ 10/Apr/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40774/
Subject: LU-13641 socklnd: replace route construct
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7766f01e891c378d1bf099e475f128ea612488f0

Comment by Gerrit Updater [ 16/Sep/22 ]

"Neil Brown <neilb@suse.de>" uploaded a new patch: https://review.whamcloud.com/48568
Subject: LU-13641 socklnd: remove remnants of tcp bonding
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1f3fd1fd3dc9dc565fd7691316324b287d161148

Comment by Gerrit Updater [ 10/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48568/
Subject: LU-13641 socklnd: remove remnants of tcp bonding
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3630e1eaf9db562a1de707762cd649db815459c8

Generated at Sat Feb 10 03:02:59 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.