Description
The Lustre timeout hierarchy must be manually configured because different network types have different timeout characteristics. In particular, the LNet transaction timeout (LTT) must be configured in accordance with the timeout characteristics of the underlying Lustre Network Drivers (LNDs), and the PtlRPC adaptive timeouts (AT) feature must be configured to account for the LTT.
Prior to the LNet multi-rail (LMR) feature, each LND defined its own timeout value. With LMR, the LND timeout is, by default, derived from the LTT and LNet retry count (LRC):
LND timeout = (LTT - 1)/(LRC + 1)
LND timeout should be the total amount of time that an LND takes to complete a network transaction (successfully, or not).
LRC is the number of times LNet may attempt to resend a message.
LTT is the total amount of time that LNet takes to complete a network transaction. This includes any attempted retries. LTT is also used to unlink GETs and PUTs (where an ACK was requested) if the REPLY or ACK has not been received within the LTT period (note, this may have dubious utility).
We should be able to remove the need for manually defining the LTT and LRC parameters. When LNet selects a network interface (NI) for a send, it can lookup the corresponding LND timeout and then calculate the LTT based on the configured LRC value. This would require that each LND define an appropriate timeout.
LNet could approximate LTT in a routed configuration based on the local NI used to send, as well as the LND timeout for the selected peer NI. This should be accurate for single-hop routes, but less so for multi-hop routes.
We will still need to allow administrators to manually define each level of the timeout hierarchy, but the goal is to make the out-of-the-box config work for most systems.
Another thought is that we could allow LNet to automatically account for upper level timeouts. It doesn't make sense for LNet to hold onto a message for longer than upper layer is willing to wait. Users of LNet could optionally pass a timeout value to LNetGet/LNetPut. This could allow LNet to automatically enable retries in the case where, for example, AT grow to some multiple of LND timeouts.