[LU-12452] lnet: allow setting ToS in ko2iblnd driver Created: 19/Jun/19  Updated: 09/Jan/24

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.8
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Lukasz Flis Assignee: Cyril Bordage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
Rank (Obsolete): 9223372036854775807

 Description   

In Cyfronet we use Ethernet RDMA for accesing Lustre filesystem located 14KM away in a secondary DC.

RoCEv2 is RDMA over Ethernet implementation which may be used over lossy network. In order to minimize effects of frame drops caused by the network congestion RoCEv2 uses ECN mechanism.

For  RoCEv2 ECN congestion control to work properly congestion marking has to be enabled on all devices all over the path. Traffic subjected for ECN marking on the network side must be properly tagged by HCA.
For this purpose DSCP field (part of TOS field) from IP packet is used to differentiate RDMA and RDMA-CNP traffic from other flows. Then ECN marking may be enabled and used only for RDMA traffic when congestion is detected.

Lustre LNET does not support setting the TOS value in ko2iblnd.
Currently - the only way to enable tos marking of RDMA traffic is to set default TOS in mlx4/5 drivers using cma_roce_tos script which is part of mOFED distribution. The script is using configfs to set desired value and must be executed before ko2iblnd module is loaded

Drawback of current way of setting tos is that it does not allow to have different ToS values in case of having more than one o2ib nets on one HCA (in separate vlans). It is also difficult to verify if proper tos has been properly set for ko2iblnd QPs.

More convenient and flexible way would be to have ko2iblnd module option for setting tos on per network basis as well as having ToS support in lnetctl for dynamic configuration.

From technical point of view it is possible to set RDMA TOS on QP basis on API level by using rdma_set_option (RDMA_OPTION_ID_TOS field)

Please consider enabling tos setting on per-network basis for lustre o2ib networks in ko2iblnd driver.

Example ko2iblnd parameter could look like this:

modprobe ko2iblnd tos2nets="o2ib80(48),o2ib81(0x18)" tos=12

where
tos - default tos, applied when no explicit mapping is given
tos2nets - set tos value on per-network basis

We have proper infrastructure in place we can use for testing and verification if it helps with development.

Best Regards

Lukasz Flis



 Comments   
Comment by Gerrit Updater [ 22/Aug/19 ]

James Simmons (jsimmons@infradead.org) uploaded a new patch: https://review.whamcloud.com/35863
Subject: LU-12452 handle: discard h_lock.
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: df94ab55bf26fd6866418f5a3bf080e8f65bd87f

Comment by Lukasz Flis [ 22/Aug/19 ]

@James Simmons above patch landed here by mistake i think. It belongs to LU-12542 instead

Comment by James A Simmons [ 22/Aug/19 ]

Yes. I just fixed the patch. Sorry.

Comment by Amir Shehata (Inactive) [ 10/Dec/19 ]

Design

https://wiki.whamcloud.com/x/lpyCBw

Comment by Gerrit Updater [ 22/Sep/21 ]

"Cyril Bordage <cbordage@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45017
Subject: LU-12452 lnet: allow setting ToS in ko2iblnd
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 76f7caaa3cb5d4d1b06ace52549b88f802263f96

Comment by Cyril Bordage [ 29/Nov/21 ]

Hello lflis,

do you still have a proper infrastructure to make the tests? It could be very helpful to test the first patch we have.

Thank you.

Comment by Lukasz Flis [ 29/Nov/21 ]

Hi,

I think we can do some testing -  please let me check what options do we have on our side.

Is this patch back-portable to 2.12?

Comment by Cyril Bordage [ 29/Dec/21 ]

Hello Lukasz,

You will receive the SRPMs soon.

The first test is simple, just to check ToS field is modified in the packet.
For that, you need to use ibdump (https://github.com/Mellanox/ibdump) to capture the packets and I will check the generated pcap files to see if the values are good.

Here is the steps to run:

  • Add "options ko2iblnd tos=<VALUETOSET>" to lnet.conf
  • Configure and load lnet
  • Run ibdump
  • Do a lnet ping to another machine
  • Stop ibdump
  • Run ibdump
  • Mount FS and do some file operations
  • Stop ibdump

Then, I would need to analyze the 2 generated pcap files.

Thanks.

Comment by Colin Faber [ 09/Jan/24 ]

cbordage what's going on with this?

Generated at Sat Feb 10 02:52:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.