Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • Spirit performance cluster
    • 3
    • 9223372036854775807

    Description

      Attempting to run P02 and P03 performance tests, with striping set as:
      $LFS setstripe $testdir --pool $ior_ostPool -E 64M -c 1 -E 4G -c 4 -E -1 -c -1I

      Immediate MPI failures with IOR

       Commencing write performance test: Thu Apr 13 21:04:16 2017
      024: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      024: --------------------------------------------------------------------------
      024: MPI_ABORT was invoked on rank 24 in communicator MPI_COMM_WORLD
      --
      ..........
      231: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      088: In: PMI_Abort(-1, N/A)
      287: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      134: In: PMI_Abort(-1, N/A)
      057: --------------------------------------------------------------------------
      057: MPI_ABORT was invoked on rank 57 in communicator MPI_COMM_WORLD 
      --
      057: 
      057: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
      057: You may or may not see output from other processes, depending on
      057: exactly when Open MPI kills them.
      057: -------------------------------------------
      

      Lustre Errors on all nodes attached.

      Attachments

        1. spirit-9.lustre.dump.gz
          3.65 MB
        2. spirit-8.lustre.dump.gz
          3.48 MB
        3. spirit-7.lustre.dump.gz
          3.26 MB
        4. spirit-30.lustre.dump.gz
          955 kB
        5. spirit-29.lustre.dump.gz
          963 kB
        6. spirit-10.lustre.dump.gz
          3.48 MB
        7. pfl.errors.txt
          17 kB
        8. ior-stripe.txt
          3 kB

        Issue Links

          Activity

            [LU-9340] PFL fails performance testsSpirit
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27097/
            Subject: LU-9340 lov: Initialize component extents unconditionally
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: df6e700c80f2c216270ca499db7373752f252166

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27097/ Subject: LU-9340 lov: Initialize component extents unconditionally Project: fs/lustre-release Branch: master Current Patch Set: Commit: df6e700c80f2c216270ca499db7373752f252166

            Andreas Dilger (andreas.dilger@intel.com) merged in patch https://review.whamcloud.com/27116/
            Subject: LU-9340 lov: Initialize component extents unconditionally
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set:
            Commit: 683fb75906cc47fc6aa8c06d47cb672add9a608a

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) merged in patch https://review.whamcloud.com/27116/ Subject: LU-9340 lov: Initialize component extents unconditionally Project: fs/lustre-release Branch: pfl Current Patch Set: Commit: 683fb75906cc47fc6aa8c06d47cb672add9a608a

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27116
            Subject: LU-9340 lov: Initialize component extents unconditionally
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set: 1
            Commit: 89375c8baccebf3cac1cfa3fd5f8ed1579fc9880

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27116 Subject: LU-9340 lov: Initialize component extents unconditionally Project: fs/lustre-release Branch: pfl Current Patch Set: 1 Commit: 89375c8baccebf3cac1cfa3fd5f8ed1579fc9880

            James - this patch won't address any performance issues.

            jay Jinshan Xiong (Inactive) added a comment - James - this patch won't address any performance issues.

            I'm preparing our small test system to see if this patch fixes the 50% drop in performance we see in our PFL testing.

            simmonsja James A Simmons added a comment - I'm preparing our small test system to see if this patch fixes the 50% drop in performance we see in our PFL testing.

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27097
            Subject: LU-9340 lov: Initialize component extents unconditionally
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d9fd41dc5c644479f740adab77381c33bf22d9dc

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27097 Subject: LU-9340 lov: Initialize component extents unconditionally Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d9fd41dc5c644479f740adab77381c33bf22d9dc

            cliff - are you able to reproduce this issue on spirit?

            jay Jinshan Xiong (Inactive) added a comment - cliff - are you able to reproduce this issue on spirit?

            We need to reopen this ticket because with the patch that landed did not fix the issue on Spirit.

            jamesanunez James Nunez (Inactive) added a comment - We need to reopen this ticket because with the patch that landed did not fix the issue on Spirit.
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            People

              jay Jinshan Xiong (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: