[LU-10159] Lnet: Ping issues with Multi-rail routers talking to down rev clients Created: 25/Oct/17  Updated: 29/Apr/20  Resolved: 29/Apr/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.10.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Malcolm Haak - NCI (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: Lnet, Multi-Rail
Environment:

Servers/Lnet Routers: Centos 7.4, MOFED 4.1-1.0.2.0, Lustre 2.10.1
Clients: Centos 6.8, MOFED ?, Lustre 2.5 (DDN ES 2.5.42.28-ddn14)


Epic/Theme: lnet
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This case is being created on behalf of ANU/NCI
Filesystem is new Lustre 2.10.1 ZFS based system

The system has been built with Multi-Rail enabled Lnet routers. These Lnet routers have a IB Bonded EDR interface on the 'filesystem' side and two EDR interfaces on the same o2ib network on the client side.
e.g. :

# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: o2ib8
      local NI(s):
        - nid: 10.112.1.81@o2ib8
          status: up
          interfaces:
              0: ibbond
    - net type: o2ib3
      local NI(s):
        - nid: 10.9.110.171@o2ib3
          status: up
          interfaces:
              0: ib1
        - nid: 10.9.110.179@o2ib3
          status: up
          interfaces:
              0: ib3

On clients each interface on the o2ib3 network is listed as a separate router. This was done as the clients are downrev and do not support multi-rail.

# lctl route_list
net              o2ib8 hops 1 gw               10.9.110.180@o2ib3 up pri 0
net              o2ib8 hops 1 gw               10.9.110.181@o2ib3 up pri 0
net              o2ib8 hops 1 gw               10.9.110.184@o2ib3 up pri 0
net              o2ib8 hops 1 gw               10.9.110.179@o2ib3 up pri 0
net              o2ib8 hops 1 gw               10.9.110.186@o2ib3 up pri 0
net              o2ib8 hops 1 gw               10.9.110.182@o2ib3 up pri 0
net              o2ib8 hops 1 gw               10.9.110.185@o2ib3 up pri 0
net              o2ib8 hops 1 gw               10.9.110.183@o2ib3 up pri 0

The clients have been configured on the lnet routers as 'Multi-Rail: True' so that the lustre code on the lnet routers will use all available interfaces on the o2ib3 network. If they are not configured as multi-rail aware, the old code path that chooses the first interface is used. Because the clients are actually attached via an FDR fabric, both EDR interfaces on the client side of the lnet routers are required to be used to achieve the target performance. This could also be achieved by using VM's as lnet routers, but that has other performance penalties that native does not.

Errors have been spotted in the client logs for clients with the filesystem mounted and clients without the filesystem mounted. They all have the routes configured.:

2017-10-25 12:07:22 [59647.382295] LNetError: 291:0:(o2iblnd_cb.c:2638:kiblnd_rejected()) 10.9.110.185@o2ib3 rejected: consumer defined fatal error
2017-10-25 12:07:22 [59647.395386] LNetError: 291:0:(o2iblnd_cb.c:2638:kiblnd_rejected()) Skipped 11 previous similar messages
2017-10-25 12:17:34 [60259.426495] LNetError: 291:0:(o2iblnd_cb.c:2638:kiblnd_rejected()) 10.9.110.185@o2ib3 rejected: consumer defined fatal error
2017-10-25 12:17:34 [60259.439527] LNetError: 291:0:(o2iblnd_cb.c:2638:kiblnd_rejected()) Skipped 11 previous similar messages
2017-10-25 12:27:46 [60871.470741] LNetError: 291:0:(o2iblnd_cb.c:2638:kiblnd_rejected()) 10.9.110.185@o2ib3 rejected: consumer defined fatal error
2017-10-25 12:27:46 [60871.483851] LNetError: 291:0:(o2iblnd_cb.c:2638:kiblnd_rejected()) Skipped 11 previous similar messages

These errors coincide with loss of available lnet routers:

net              o2ib8 hops 1 gw               10.9.110.180@o2ib3 down pri 0
net              o2ib8 hops 1 gw               10.9.110.181@o2ib3 down pri 0
net              o2ib8 hops 1 gw               10.9.110.184@o2ib3 down pri 0
net              o2ib8 hops 1 gw               10.9.110.179@o2ib3 down pri 0
net              o2ib8 hops 1 gw               10.9.110.186@o2ib3 down pri 0
net              o2ib8 hops 1 gw               10.9.110.182@o2ib3 down pri 0
net              o2ib8 hops 1 gw               10.9.110.185@o2ib3 down pri 0
net              o2ib8 hops 1 gw               10.9.110.183@o2ib3 down pri 0

The issue appears to be that the lnet pings from clients to ensure routes are valid are not always returning from the ni that received the ping. This is causing the downrev clients to generate the above errors and flag the route as down.

We are hoping a patch could be created to allow the above config (specifically the dual connections to the o2ib3 network) to be used in production.

This system is not yet in production but is in final testing and there is some time pressure.



 Comments   
Comment by Peter Jones [ 25/Oct/17 ]

Amir

Can you please assist with this one?

Thanks

Peter

Comment by Amir Shehata (Inactive) [ 25/Oct/17 ]

I don't believe this is a code issue, rather a linux routing issue. Please take a look at the description below, and make sure your linux routing configuration matches:

https://wiki.hpdd.intel.com/display/LNet/MR+Cluster+Setup

Let me know if that works.

Comment by Malcolm Haak - NCI (Inactive) [ 27/Oct/17 ]

We had a Systemd unit that was supposed to set this up correctly at boot for us. It had an error in the install section and was not being automatically started.

I have resolved this issue.

However initially the clients were still seeing issues. It seems a client reboot is required to get them to a stable state.

For the interested I will embed the SystemD unit file if you want to add it to the wiki to be used with the code samples you have already provided

[Unit]
Description=Configure IB interfaces for Lnet Multi-Rail
Requires=network.target network.service
After=network.target network.service networking.service
Before=lnet.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/sbin/lnet_net_configure

[Install]
WantedBy=multi-user.target

The script lnet_net_configure is where you add the required script segments to configure the routing and arp sysctl settings.

It has a Before section to ensure things are correct before lnet is started. This appears to have resolved the issue however we are going to do some more through testing before requesting case closure.

Thanks

Generated at Sat Feb 10 02:32:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.