The design document is very useful, thanks.
I do have one concern: the code looks through lists of routes while holding a spinlock and with interrupts disabled on the CPU (spin_lock_irqsave() and friends). This will definitely be a problem if these lists become large, because a system becomes unstable if one or more of the CPU cores runs for a long time with interrupts disabled.
Trying to figure how large these lists can become, if we have a cluster with N clients, M MDS, O OSS, and ignoring routers, assuming just one interface for each system I get something like this:
- on a client: M + O
- on an MDS: N + M + O
- on an OSS: N + M
This shouldn't be much of a problem in a small cluster, but in a large cluster it would be the MDS and OSS in particular that have large lists. So my concern is that there is a scaling problem that will render MDS and OSS unstable in large clusters, but will be invisible in the small clusters typically used for testing.
To take a step back for a moment, I think we need to have a good answer to the following question:
Why is implementing channel bonding at the LND level the right thing to do rather than implementing channel bonding at the LNet level?
It is not clear to me that the current configuration approach when done at the LND level is very robust or system administration friendly, and ways to fix that don't seem terribly easy to do since this implementation is all hacked into a single LND component. I think that I can envision a system at the LNet level that would be much easier for system administrators to work with (because NIDs are already shared between nodes). I am also concerned about how credits at the LNet layer are going to interact with multiple invisible peer connections in the LND layer.