Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
A single shared file IOR job aborted with the following EIO error during the seventh write iteration:
Using Time Stamp 1588769149 (0x5eb2b17d) for Data Signature delaying 1 seconds . . . Commencing write performance test. Wed May 6 07:45:50 2020 ADIOI_CRAY_WRITECONTIG(261): filename='/lus/snx11281/disk/ostest.vers/alsorun.20200504152303.27104.saturn-p4/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x1_1069k.1.dlY06h.1588768616/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x1_1069k/IORfile_1m' error='Input/output error' errno=5 PE=00001 W_rec=03163 off=0840695808 len=0000262144 See MPICH_MPIIO_ABORT_ON_RW_ERROR. ** error ** ERROR in aiori-MPIIO.c (line 298): cannot access explicit, collective. MPI No MPI error ** exiting ** Rank 1 [Wed May 6 07:45:50 2020] [c0-0c2s9n2] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1 _pmiu_daemon(SIGCHLD): [NID 00166] [c0-0c2s9n2] [Wed May 6 07:45:50 2020] PE RANK 1 exit signal Aborted [NID 00166] 2020-05-06 07:45:51 Apid 5829365: initiated application termination Application 5829365 exit codes: 134 Application 5829365 exit signals: Killed Application 5829365 resources: utime ~159s, stime ~9s, Rss ~28544, inblocks ~8314, outblocks ~3330760 Job Script: command stopped at Wed May 6 07:45:51 CDT 2020 Job Script: command runtime was 238 seconds
the following error was found in the console log:
console-20200506:2020-05-06T07:45:55.177486-05:00 c0-0c2s9n2 LustreError: 14039:0:(vvp_io.c:1505:vvp_io_init()) snx11281: refresh file layout [0x240336a96:0x1efc4:0x0] error -5.
Attachments
Issue Links
- is duplicated by
-
LU-14372 LustreError: 38823:0:(vvp_io.c:1562:vvp_io_init()) nbp11: refresh file layout [0x2400498d9:0x1642:0x0] error -5
- Resolved
- is related to
-
LU-14787 Provide an abstraction for AS_EXITING
- Resolved
-
LU-16497 various lustre errors on clients and servers
- Resolved
- is related to
-
LU-118 clear_inode: BUG_ON(inode->i_data.nrpages)
- Resolved