[LU-13566] socklnd: wrong NID to interface mapping Created: 15/May/20 Updated: 29/Jul/20 Resolved: 28/Jun/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Amir Shehata (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
In a Multi-Rail setup using ethernet interfaces, it appears like there is a wrong mapping between the LNet level NID and the ethernet interfaces. When we restrict traffic from LNet on a subset of the NIDs, even for that subset, the interfaces don't match. For example netstat -i can show traffic on eth0 and eth2. But LNet shows that it's using eth1 and eth2. However, when using iperf, all ethernet interfaces are used according to netstat -i This behavior is easily reproducible on a simple 2 VM MR setup. |
| Comments |
| Comment by Andreas Dilger [ 21/May/20 ] |
|
Is this happening with multiple Ethernet interfaces on the same subnet? I recall ages ago that there was a problem with "source routing" for ethernet, in that the kernel would select whatever interface it wanted on that subnet, even if LNet is trying to use a specific interface for outgoing packets. This might be helped by patch https://review.whamcloud.com/37702 "LU-10391 socklnd: use interface index to track local addr" to ensure that the specific interface is used rather than trying to use the address to guide interface selection. |
| Comment by Amir Shehata (Inactive) [ 22/May/20 ] |
|
I actually found a problem with that patch. It breaks binding a socket to the correct interface. As a result we keep binding to the same interface. However, even when I fixed this issue netstat -i still shows all traffic going over only one of the interfaces. I'm continuing my investigation. *Correction the same problem was there from the beginning in socklnd. It was not introduced by the patch. |
| Comment by Gerrit Updater [ 28/May/20 ] |
|
Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38743 |
| Comment by Amir Shehata (Inactive) [ 28/May/20 ] |
|
For tcp workloads it's important to properly set the ARP, reverse path filtering and routing config, to make sure packets egress over the intended interfaces in a multi-rail setup. |
| Comment by Gerrit Updater [ 28/Jun/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38743/ |
| Comment by Peter Jones [ 28/Jun/20 ] |
|
Landed for 2.14 |