Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • Spirit performance cluster
    • 3
    • 9223372036854775807

    Description

      Attempting to run P02 and P03 performance tests, with striping set as:
      $LFS setstripe $testdir --pool $ior_ostPool -E 64M -c 1 -E 4G -c 4 -E -1 -c -1I

      Immediate MPI failures with IOR

       Commencing write performance test: Thu Apr 13 21:04:16 2017
      024: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      024: --------------------------------------------------------------------------
      024: MPI_ABORT was invoked on rank 24 in communicator MPI_COMM_WORLD
      --
      ..........
      231: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      088: In: PMI_Abort(-1, N/A)
      287: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      134: In: PMI_Abort(-1, N/A)
      057: --------------------------------------------------------------------------
      057: MPI_ABORT was invoked on rank 57 in communicator MPI_COMM_WORLD 
      --
      057: 
      057: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
      057: You may or may not see output from other processes, depending on
      057: exactly when Open MPI kills them.
      057: -------------------------------------------
      

      Lustre Errors on all nodes attached.

      Attachments

        1. pfl.errors.txt
          17 kB
        2. spirit-29.lustre.dump.gz
          963 kB
        3. spirit-30.lustre.dump.gz
          955 kB
        4. spirit-7.lustre.dump.gz
          3.26 MB
        5. spirit-10.lustre.dump.gz
          3.48 MB
        6. spirit-8.lustre.dump.gz
          3.48 MB
        7. spirit-9.lustre.dump.gz
          3.65 MB
        8. ior-stripe.txt
          3 kB

        Issue Links

          Activity

            [LU-9340] PFL fails performance testsSpirit
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
            jay Jinshan Xiong (Inactive) made changes -
            Link New: This issue is related to LU-8494 [ LU-8494 ]
            jay Jinshan Xiong (Inactive) made changes -
            Comment [ cliff - are you able to reproduce this issue on spirit? ]
            jamesanunez James Nunez (Inactive) made changes -
            Resolution Original: Fixed [ 1 ]
            Status Original: Resolved [ 5 ] New: Reopened [ 4 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            cliffw Cliff White (Inactive) made changes -
            Attachment New: ior-stripe.txt [ 26582 ]
            cliffw Cliff White (Inactive) made changes -
            Attachment New: spirit-7.lustre.dump.gz [ 26574 ]
            Attachment New: spirit-8.lustre.dump.gz [ 26575 ]
            Attachment New: spirit-9.lustre.dump.gz [ 26576 ]
            Attachment New: spirit-10.lustre.dump.gz [ 26577 ]
            Attachment New: spirit-29.lustre.dump.gz [ 26578 ]
            Attachment New: spirit-30.lustre.dump.gz [ 26579 ]
            eberglan Eric Bergland (Inactive) made changes -
            Link New: This issue is related to LU-9349 [ LU-9349 ]
            jamesanunez James Nunez (Inactive) made changes -
            Remote Link New: This issue links to "Page (HPDD Community Wiki)" [ 20272 ]
            jgmitter Joseph Gmitter (Inactive) made changes -
            Link New: This issue is related to LU-8998 [ LU-8998 ]

            People

              jay Jinshan Xiong (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: