Details
Description
We recently experienced major issues switching Lustre routing clusters to 2.12 and ended up reverting them to 2.10. In trying to better understand LNet, I read through various documentation pages, but was left with several questions. Can you help answer the following questions and perhaps update the LNet docs as well?
Questions:
- What is the data flow, or order of operations, for sending LNet messages from client to server, through routers? For example, a mock (and incorrect?) model might be:
- Client determines next hop router
- Client checks available routing buffer credits (rtr) on router
- Client checks available send credits (tx) and peer_credits to that router on self
- Client sends <= #peer_credits messages, decrementing tx for each message
- Router receives messages in routing buffers, depending on message size, and decrements # routing buffer credits (rtr) for each message.
- Router then acts as the client, repeating steps 1-5 above to the next hop as well as back to the original client (as data is received)
- Is there any need to manually add peers in a non-MR config? My understanding is no.
- Should a router have a peer entry for every node in the expanded SAN, including in other networks it needs to be routed to?
- The manual states "The number of credits currently in flight (number of transmit credits) is shown in the tx column.... Therefore, rtr – tx is the number of transmits in flight." It seems "in flight" for the "tx" description should be "available" so that rtr-tx would be "in flight", right?
- Should a NID ever show for the wrong interface (e.g. tcp instead of o2ibXX)? We will sometimes see messages in logs from <addr>@tcp when it should be <addr>@o2ibX.
- Do the older mlx4 lnet settings need to be updated for mlx5 or are they still applicable? (https://wiki.lustre.org/LNet_Router_Config_Guide#Configure_Lustre_Servers)?