Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Lustre 2.15.0
-
None
-
3
-
9223372036854775807
Description
MPIIO job abort due to file handling issue. In current 24 hours FOFB run with Regression
Write Verify Dne2 DOM SEL OVS.
write 582.27 524288 1024.00 0.005614 56.27 0.000848 56.28 3 XXCEL Verifying contents of the file(s) just written. Mon Jan 10 08:15:21 2022delaying 1 seconds . . . ** error ** ** error ** ** error ** ** error ** ** error ** ** error ** ** error ** ** error ** ERROR in aiori-MPIIO.c (line 128): cannot open file. ERROR in aiori-MPIIO.c (line 128): cannot open file. ERROR in aiori-MPIIO.c (line 128): cannot open file. ERROR in aiori-MPIIO.c (line 128): cannot open file. ERROR in aiori-MPIIO.c (line 128): cannot open file. ERROR in aiori-MPIIO.c (line 128): cannot open file. ERROR in aiori-MPIIO.c (line 128): cannot open file. ERROR in aiori-MPIIO.c (line 128): cannot open file.
** exiting ** Rank 1 [Mon Jan 10 08:17:46 2022] [c3-0c0s8n1] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 1 Rank 7 [Mon Jan 10 08:17:46 2022] [c3-0c0s8n1] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 7 Rank 2 [Mon Jan 10 08:17:46 2022] [c3-0c0s8n1] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 2 Rank 4 [Mon Jan 10 08:17:46 2022] [c3-0c0s8n1] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 4 Rank 6 [Mon Jan 10 08:17:46 2022] [c3-0c0s8n1] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 6 Rank 5 [Mon Jan 10 08:17:46 2022] [c3-0c0s8n1] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 5 Rank 3 [Mon Jan 10 08:17:46 2022] [c3-0c0s8n1] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 3 MPI No MPI error MPI No MPI error MPI No MPI error ** exiting **
Test tag summary :
aptrun -n 64 -N 8 /cray/css/ostest/binaries/xt/rel.70up03.aries.cray-sp2/xtcnl/ostest/ROOT.latest/tests/gold/ioperf/IOR/IOR -o /lus/kjcf05/flash/ostest.vers/alsorun.20220110080202.31077.walleye-p5/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x8_1m.3.d0DIRA.1641823358/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x8_1m/IORfile_1m -w -r -W -i 8 -t 1m -a MPIIO -b 512m -C -k -u -vv -q -d 1 -c
Summary: api = MPIIO (version=3, subversion=1) test filename = /lus/kjcf05/flash/ostest.vers/alsorun.20220110080202.31077.walleye-p5/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x8_1m.3.d0DIRA.1641823358/CL_IOR_pfl_ssf_mpiioc_wr_8iter_n8x8_1m/IORfile_1m access = single-shared-file, collective pattern = segmented (1 segment) ordering in a file = sequential offsets ordering inter file=constant task offsets = 1 clients = 64 (8 per node) repetitions = 8 xfersize = 1 MiB blocksize = 512 MiB aggregate filesize = 32 GiB
I Will upload the dk logs and other logs from lustre nodes to FTP.
Attachments
Issue Links
- duplicates
-
LU-15788 lazystatfs + FOFB + mpich problems
- Resolved