Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14081

Filesystem timeout alignment

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      There is an implicit dependency between the various timeouts in the system: ldlm, Adaptive Timeout, LNet transaction timeout, LND timeout and the underlying protocol timeout (TCP or IB). Ideally the lower layers should timeout before the upper layers. What happens now though is the timeouts are independently tuned. This could run us in a situation where the Adaptive timeout is triggered first forcing all the memory descriptors to be cleaned up. However since the LNet/LND connection is still up we could receive messages which reference MDs which have been freed.

      It'll be better to devise a method to keep these timeout values in sync. One method is bottom up, where the LND timeout determines what the AT min is. Another approach is top down where the AT min determines what the LND timeout is.

      We need to investigate the best approach.

      There have been a few presentations on the subject attached to the ticket.

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              ashehata Amir Shehata (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: