Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.10.0
    • Lustre 2.10.0
    • Spirit performance cluster
    • 3
    • 9223372036854775807

    Description

      Attempting to run P02 and P03 performance tests, with striping set as:
      $LFS setstripe $testdir --pool $ior_ostPool -E 64M -c 1 -E 4G -c 4 -E -1 -c -1I

      Immediate MPI failures with IOR

       Commencing write performance test: Thu Apr 13 21:04:16 2017
      024: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      024: --------------------------------------------------------------------------
      024: MPI_ABORT was invoked on rank 24 in communicator MPI_COMM_WORLD
      --
      ..........
      231: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      088: In: PMI_Abort(-1, N/A)
      287: ior ERROR: write() failed, errno 61, No data available (aiori-POSIX.c:335)
      134: In: PMI_Abort(-1, N/A)
      057: --------------------------------------------------------------------------
      057: MPI_ABORT was invoked on rank 57 in communicator MPI_COMM_WORLD 
      --
      057: 
      057: NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
      057: You may or may not see output from other processes, depending on
      057: exactly when Open MPI kills them.
      057: -------------------------------------------
      

      Lustre Errors on all nodes attached.

      Attachments

        1. ior-stripe.txt
          3 kB
        2. pfl.errors.txt
          17 kB
        3. spirit-10.lustre.dump.gz
          3.48 MB
        4. spirit-29.lustre.dump.gz
          963 kB
        5. spirit-30.lustre.dump.gz
          955 kB
        6. spirit-7.lustre.dump.gz
          3.26 MB
        7. spirit-8.lustre.dump.gz
          3.48 MB
        8. spirit-9.lustre.dump.gz
          3.65 MB

        Issue Links

          Activity

            [LU-9340] PFL fails performance testsSpirit
            pjones Peter Jones added a comment -

            Landed for 2.10

            pjones Peter Jones added a comment - Landed for 2.10

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27097/
            Subject: LU-9340 lov: Initialize component extents unconditionally
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: df6e700c80f2c216270ca499db7373752f252166

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27097/ Subject: LU-9340 lov: Initialize component extents unconditionally Project: fs/lustre-release Branch: master Current Patch Set: Commit: df6e700c80f2c216270ca499db7373752f252166

            Andreas Dilger (andreas.dilger@intel.com) merged in patch https://review.whamcloud.com/27116/
            Subject: LU-9340 lov: Initialize component extents unconditionally
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set:
            Commit: 683fb75906cc47fc6aa8c06d47cb672add9a608a

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) merged in patch https://review.whamcloud.com/27116/ Subject: LU-9340 lov: Initialize component extents unconditionally Project: fs/lustre-release Branch: pfl Current Patch Set: Commit: 683fb75906cc47fc6aa8c06d47cb672add9a608a

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27116
            Subject: LU-9340 lov: Initialize component extents unconditionally
            Project: fs/lustre-release
            Branch: pfl
            Current Patch Set: 1
            Commit: 89375c8baccebf3cac1cfa3fd5f8ed1579fc9880

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27116 Subject: LU-9340 lov: Initialize component extents unconditionally Project: fs/lustre-release Branch: pfl Current Patch Set: 1 Commit: 89375c8baccebf3cac1cfa3fd5f8ed1579fc9880

            James - this patch won't address any performance issues.

            jay Jinshan Xiong (Inactive) added a comment - James - this patch won't address any performance issues.

            I'm preparing our small test system to see if this patch fixes the 50% drop in performance we see in our PFL testing.

            simmonsja James A Simmons added a comment - I'm preparing our small test system to see if this patch fixes the 50% drop in performance we see in our PFL testing.

            Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27097
            Subject: LU-9340 lov: Initialize component extents unconditionally
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d9fd41dc5c644479f740adab77381c33bf22d9dc

            gerrit Gerrit Updater added a comment - Jinshan Xiong (jinshan.xiong@intel.com) uploaded a new patch: https://review.whamcloud.com/27097 Subject: LU-9340 lov: Initialize component extents unconditionally Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d9fd41dc5c644479f740adab77381c33bf22d9dc

            cliff - are you able to reproduce this issue on spirit?

            jay Jinshan Xiong (Inactive) added a comment - cliff - are you able to reproduce this issue on spirit?

            People

              jay Jinshan Xiong (Inactive)
              cliffw Cliff White (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: