Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
There is an implicit dependency between the various timeouts in the system: ldlm, Adaptive Timeout, LNet transaction timeout, LND timeout and the underlying protocol timeout (TCP or IB). Ideally the lower layers should timeout before the upper layers. What happens now though is the timeouts are independently tuned. This could run us in a situation where the Adaptive timeout is triggered first forcing all the memory descriptors to be cleaned up. However since the LNet/LND connection is still up we could receive messages which reference MDs which have been freed.
It'll be better to devise a method to keep these timeout values in sync. One method is bottom up, where the LND timeout determines what the AT min is. Another approach is top down where the AT min determines what the LND timeout is.
We need to investigate the best approach.
There have been a few presentations on the subject attached to the ticket.
Attachments
Issue Links
- mentioned in
-
Page Loading...