Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13145

LNet Health: increase transaction timeout

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: Lustre 2.14.0, Lustre 2.12.4
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      On larger clusters it appears like the current transaction timeout default of 10 seconds is too short and it causes RDMA timeouts.

      The proposal is to increase the timeout to 150s. With a retry count of 3 that would bring the LND timeout to 50s, which was the initial value before health.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ashehata Amir Shehata
                Reporter:
                ashehata Amir Shehata
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: