Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11765

during failover test run, mdtest job fails, numerous stat failures 'No such file or directory'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0
    • None
    • 3
    • 9223372036854775807

    Description

      Running a failover test(random failover of OSSs + mdtest), an mdtest job failed reporting stat failures. This looked similar to LU-11760, except that this time all the files were actually present, valid after the job failure.

      V-1: Entering create_remove_items_helper...
      V-1: Entering unique_dir_access...
      V-1: Entering mdtest_stat...
      08/19/2018 07:15:43: Process 10(nid00265): FAILED in mdtest_stat, unable to stat file: No such file or directory
      08/19/2018 07:15:43: Process 15(nid00279): FAILED in mdtest_stat, unable to stat file: No such file or directory
      Rank 10 [Sun Aug 19 07:15:43 2018] [c1-0c0s4n1] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 10
      Rank 15 [Sun Aug 19 07:15:43 2018] [c1-0c0s4n3] application called MPI_Abort(MPI_COMM_WORLD, 1) - process 15

      Details of fail reason could be found in a patch commit message. I will upload it in the nearest time.

      Attachments

        Activity

          People

            scherementsev Sergey Cheremencev
            scherementsev Sergey Cheremencev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: