[LU-14644] IOR SSF PFL ill-formed I/O job aborted with EIO during automated FOFB testing Created: 27/Apr/21  Updated: 23/Jan/23  Resolved: 27/May/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0, Lustre 2.12.10

Type: Bug Priority: Minor
Reporter: Vitaly Fertman Assignee: Vitaly Fertman
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-14372 LustreError: 38823:0:(vvp_io.c:1562:v... Resolved
Related
is related to LU-118 clear_inode: BUG_ON(inode->i_data.nrp... Resolved
is related to LU-14787 Provide an abstraction for AS_EXITING Resolved
is related to LU-16497 various lustre errors on clients and ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

A single shared file IOR job aborted with the following EIO error during the seventh write iteration:

Using Time Stamp 1588769149 (0x5eb2b17d) for Data Signature
delaying 1 seconds . . .
Commencing write performance test.
Wed May  6 07:45:50 2020
 
ADIOI_CRAY_WRITECONTIG(261): filename='/lus/snx11281/disk/ostest.vers/alsorun.20200504152303.27104.saturn-p4/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x1_1069k.1.dlY06h.1588768616/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x1_1069k/IORfile_1m'  error='Input/output error'  errno=5  PE=00001  W_rec=03163  off=0840695808  len=0000262144  See MPICH_MPIIO_ABORT_ON_RW_ERROR.
** error **
ERROR in aiori-MPIIO.c (line 298): cannot access explicit, collective.
MPI No MPI error
** exiting **
Rank 1 [Wed May  6 07:45:50 2020] [c0-0c2s9n2] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1
_pmiu_daemon(SIGCHLD): [NID 00166] [c0-0c2s9n2] [Wed May  6 07:45:50 2020] PE RANK 1 exit signal Aborted
[NID 00166] 2020-05-06 07:45:51 Apid 5829365: initiated application termination
Application 5829365 exit codes: 134
Application 5829365 exit signals: Killed
Application 5829365 resources: utime ~159s, stime ~9s, Rss ~28544, inblocks ~8314, outblocks ~3330760
Job Script: command stopped at Wed May 6 07:45:51 CDT 2020
Job Script: command runtime was 238 seconds

the following error was found in the console log:

console-20200506:2020-05-06T07:45:55.177486-05:00 c0-0c2s9n2 LustreError: 14039:0:(vvp_io.c:1505:vvp_io_init()) snx11281: refresh file layout [0x240336a96:0x1efc4:0x0] error -5.


 Comments   
Comment by Gerrit Updater [ 27/Apr/21 ]

Vitaly Fertman (vitaly.fertman@hpe.com) uploaded a new patch: https://review.whamcloud.com/43464
Subject: LU-14644 vvp: wait for nrpages to be updated
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 3bb3301a45b5d015c64389b074e014a894148ce7

Comment by Gerrit Updater [ 27/May/21 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43464/
Subject: LU-14644 vvp: wait for nrpages to be updated
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7d5d004506650c3739898e70d72c9a86b8aeeb88

Comment by Peter Jones [ 27/May/21 ]

Landed for 2.15

Comment by Gerrit Updater [ 29/Mar/22 ]

"John L. Hammond <jhammond@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46948
Subject: LU-14644 vvp: wait for nrpages to be updated
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 1c73a734375a05fa75a9a55fa89494f743a10114

Comment by Gerrit Updater [ 20/Sep/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46948/
Subject: LU-14644 vvp: wait for nrpages to be updated
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 54b44848f353b7730e9b5fed1b74e2b655030ff6

Generated at Sat Feb 10 03:11:32 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.