Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11503

ha.sh defects: ha.sh: line 399: (1476438773 - start_time) / nr_loops: division by 0

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.13.0
    • Lustre 2.12.0
    • None
    • 3
    • 9223372036854775807

    Description

      several ha.sh defects found during wide ha testing :
      1.
      I.e. if stop file is created before ha_repeat_mpi_load() "while" is started – we have :

      ha.sh: line 399: (1476438773 - start_time) / nr_loops: division by 0 (error token is "nr_loops")
      

      2.
      in hard failover mode (pm -0 <node>) some node could be down at the time when ha.sh collects the lustre logs.
      In this case we have test passed but returns 255 at the end :
      stdout :

      /usr/lib64/lustre/tests/ha.sh: 23:18:01 1476487081: ---------------8<---------------
      /usr/lib64/lustre/tests/ha.sh: 23:18:01 1476487081: Summary:
      /usr/lib64/lustre/tests/ha.sh: 23:18:01 1476487081:     Duration: 44887s
      /usr/lib64/lustre/tests/ha.sh: 23:18:01 1476487081:     Loops: 20
      

      stderr :

      redpill00: failback: Operation performed successfully.
      pdsh@redpill-client08: redpill16: ssh exited with exit code 255
      /usr/lib64/lustre/tests/ha.sh: 23:18:05 1476487085: not all logs are dumped! Some nodes are unreachable.
      pdsh@redpill-client08: redpill16: ssh exited with exit code 255
      /usr/lib64/lustre/tests/ha.sh: 00:45:57 1476492357: Trap ERR triggered by:
      /usr/lib64/lustre/tests/ha.sh: 00:45:57 1476492357:     return $rc
      

      Attachments

        Activity

          People

            egryaznova Elena Gryaznova
            egryaznova Elena Gryaznova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: