[LU-16028] LNet tunings for large, interbuilding SAN Created: 19/Jul/22 Updated: 20/Jul/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.7 |
| Fix Version/s: | None |
| Type: | Question/Request | Priority: | Minor |
| Reporter: | Cameron Harr | Assignee: | Serguei Smirnov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | llnl | ||
| Environment: |
OS: Mostly RHEL 7.9, some RHEL 8.4+ |
||
| Epic/Theme: | lnet |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Topology: We have various Lustre client clusters spread across 3 buildings on our campus, with most of the clients and servers for the relevant network here (CZ Network), in two buildings, connected by twin 100G links. On the CZ network there are 3 Lustre clusters. In each building there is an EDR IB SAN that Lustre servers and lustre routers connect to. Inside each compute cluster, there is a local Lustre network, based on either EDR IB (with few exceptions) or Omnipath. Clients may access Lustre servers in their local building or the remote one. For a client to get to the remote cluster, it may pass through a local router (if on it's own lustre network) to get to the building SAN, then go through the inter-building routers (a set on each side) where the transmission goes over ksocklnd, then back to ko2iblnd to the other building's SAN. network topology:
(each name represents a cluster)
o2ib100 / o2ib600
syrah----+ / +---quartz
ruby-----+ / +---pascal
corona---+-orelic--------zrelic-+---copper(lustre1)
catalyst-+ / +---zinc(lustre2)
...------+ / +---...
Summary: We've had routing issues on our CZ lustre network for a couple years (see LU-14026) – roughly when we updated from Lustre 2.10 to 2.12. After upgrading, our inter-building routers (we call relics) would seemingly jam up and stop sending messages, bringing things to a halt. We eventually downgraded them to Lustre 2.10 and the problem went away. Since then we have tried various tunings and modifications, but the problem persists. Interestingly, on a parallel, but significantly smaller, inter-building network, the Lustre clients and servers do not exhibit these same problems. All servers on that other network are running Lustre 2.12. Tunings: In a case with Serguei a few months ago where we suspected routing issues, I asked him if there was a way to verify our settings are sane, given our topology; he suggested opening a ticket with Whamcloud to review them. I've copied below what I think are the most relevant settings below, but can provide additional information if needed. Can you help us understand what the appropriate tunings/settings are for our large Lustre network? Note too, that we have discovery turned off.
|
| Comments |
| Comment by Peter Jones [ 20/Jul/22 ] |
|
Serguei Could you please advise Thanks Peter |