[LU-14644] IOR SSF PFL ill-formed I/O job aborted with EIO during automated FOFB testing Created: 27/Apr/21 Updated: 23/Jan/23 Resolved: 27/May/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0, Lustre 2.12.10 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Vitaly Fertman | Assignee: | Vitaly Fertman |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||
| Description |
|
A single shared file IOR job aborted with the following EIO error during the seventh write iteration: Using Time Stamp 1588769149 (0x5eb2b17d) for Data Signature delaying 1 seconds . . . Commencing write performance test. Wed May 6 07:45:50 2020 ADIOI_CRAY_WRITECONTIG(261): filename='/lus/snx11281/disk/ostest.vers/alsorun.20200504152303.27104.saturn-p4/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x1_1069k.1.dlY06h.1588768616/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x1_1069k/IORfile_1m' error='Input/output error' errno=5 PE=00001 W_rec=03163 off=0840695808 len=0000262144 See MPICH_MPIIO_ABORT_ON_RW_ERROR. ** error ** ERROR in aiori-MPIIO.c (line 298): cannot access explicit, collective. MPI No MPI error ** exiting ** Rank 1 [Wed May 6 07:45:50 2020] [c0-0c2s9n2] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1 _pmiu_daemon(SIGCHLD): [NID 00166] [c0-0c2s9n2] [Wed May 6 07:45:50 2020] PE RANK 1 exit signal Aborted [NID 00166] 2020-05-06 07:45:51 Apid 5829365: initiated application termination Application 5829365 exit codes: 134 Application 5829365 exit signals: Killed Application 5829365 resources: utime ~159s, stime ~9s, Rss ~28544, inblocks ~8314, outblocks ~3330760 Job Script: command stopped at Wed May 6 07:45:51 CDT 2020 Job Script: command runtime was 238 seconds the following error was found in the console log: console-20200506:2020-05-06T07:45:55.177486-05:00 c0-0c2s9n2 LustreError: 14039:0:(vvp_io.c:1505:vvp_io_init()) snx11281: refresh file layout [0x240336a96:0x1efc4:0x0] error -5. |
| Comments |
| Comment by Gerrit Updater [ 27/Apr/21 ] |
|
Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/43464 |
| Comment by Gerrit Updater [ 27/May/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43464/ |
| Comment by Peter Jones [ 27/May/21 ] |
|
Landed for 2.15 |
| Comment by Gerrit Updater [ 29/Mar/22 ] |
|
"John L. Hammond <jhammond@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46948 |
| Comment by Gerrit Updater [ 20/Sep/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46948/ |