Details
Description
We recently experienced major issues switching Lustre routing clusters to 2.12 and ended up reverting them to 2.10. In trying to better understand LNet, I read through various documentation pages, but was left with several questions. Can you help answer the following questions and perhaps update the LNet docs as well?
Questions:
- What is the data flow, or order of operations, for sending LNet messages from client to server, through routers? For example, a mock (and incorrect?) model might be:
- Client determines next hop router
- Client checks available routing buffer credits (rtr) on router
- Client checks available send credits (tx) and peer_credits to that router on self
- Client sends <= #peer_credits messages, decrementing tx for each message
- Router receives messages in routing buffers, depending on message size, and decrements # routing buffer credits (rtr) for each message.
- Router then acts as the client, repeating steps 1-5 above to the next hop as well as back to the original client (as data is received)
- Is there any need to manually add peers in a non-MR config? My understanding is no.
- Should a router have a peer entry for every node in the expanded SAN, including in other networks it needs to be routed to?
- The manual states "The number of credits currently in flight (number of transmit credits) is shown in the tx column.... Therefore, rtr – tx is the number of transmits in flight." It seems "in flight" for the "tx" description should be "available" so that rtr-tx would be "in flight", right?
- Should a NID ever show for the wrong interface (e.g. tcp instead of o2ibXX)? We will sometimes see messages in logs from <addr>@tcp when it should be <addr>@o2ibX.
- Do the older mlx4 lnet settings need to be updated for mlx5 or are they still applicable? (https://wiki.lustre.org/LNet_Router_Config_Guide#Configure_Lustre_Servers)?
Thanks Amir. Can you clarify your comment of, "Since o2iblnd is generally more performant than the socklnd, it would make sense to have a larger number of peer_credits for the socklnd network"? Are you saying that because socklnd can't send/recv messages as fast as o2iblnd, you want to increase peer credits to allow more messages to its o2iblnd peers? Wouldn't increasing that number also cause more messages to be send to the socklnd NI, overwhelming it more?
As for credits/buffers, we do have many cases where the min number of RTR or TX credits (via the peers file) show negative, but the buffers via lnetctl routing show are not close to negative. So, rather than a buffer issue, does this imply a higher number of peer_credits (and total credits) needs to be specified?