Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14644

IOR SSF PFL ill-formed I/O job aborted with EIO during automated FOFB testing

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      A single shared file IOR job aborted with the following EIO error during the seventh write iteration:

      Using Time Stamp 1588769149 (0x5eb2b17d) for Data Signature
      delaying 1 seconds . . .
      Commencing write performance test.
      Wed May  6 07:45:50 2020
       
      ADIOI_CRAY_WRITECONTIG(261): filename='/lus/snx11281/disk/ostest.vers/alsorun.20200504152303.27104.saturn-p4/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x1_1069k.1.dlY06h.1588768616/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x1_1069k/IORfile_1m'  error='Input/output error'  errno=5  PE=00001  W_rec=03163  off=0840695808  len=0000262144  See MPICH_MPIIO_ABORT_ON_RW_ERROR.
      ** error **
      ERROR in aiori-MPIIO.c (line 298): cannot access explicit, collective.
      MPI No MPI error
      ** exiting **
      Rank 1 [Wed May  6 07:45:50 2020] [c0-0c2s9n2] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
      _pmiu_daemon(SIGCHLD): [NID 00166] [c0-0c2s9n2] [Wed May  6 07:45:50 2020] PE RANK 1 exit signal Aborted
      [NID 00166] 2020-05-06 07:45:51 Apid 5829365: initiated application termination
      Application 5829365 exit codes: 134
      Application 5829365 exit signals: Killed
      Application 5829365 resources: utime ~159s, stime ~9s, Rss ~28544, inblocks ~8314, outblocks ~3330760
      Job Script: command stopped at Wed May 6 07:45:51 CDT 2020
      Job Script: command runtime was 238 seconds

      the following error was found in the console log:

      console-20200506:2020-05-06T07:45:55.177486-05:00 c0-0c2s9n2 LustreError: 14039:0:(vvp_io.c:1505:vvp_io_init()) snx11281: refresh file layout [0x240336a96:0x1efc4:0x0] error -5.

      Attachments

        Issue Links

          Activity

            People

              vitaly_fertman Vitaly Fertman
              vitaly_fertman Vitaly Fertman
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: