Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17735

Sanity sanity test 7b spurious failures due to check_node_health not waiting long enough for network health

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      check_node_health only attempts pings for 5 seconds prior to failing out with "Network not available!".  This may well work for a static, stable set of hosts used for testing on-premise, but our auster test farm is created on-demand each and every time a suite of tests is fired off.  The likelihood of hitting a transient networking issue that self-resolves quickly (but longer than 5 seconds) is accordingly higher.

      This just bumps that check from 5 seconds to a minute.  It shouldn't impact auster testing for anybody on-prem in any material way, but has eliminated transient failures for us.

      Patch to be sent shortly.

      Attachments

        Activity

          People

            elliswilson Ellis Wilson
            elliswilson Ellis Wilson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: