Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.15.0
-
3
-
9223372036854775807
Description
During FOFB tests with IOR and mpich we observing next errors. I've created a timeline for a issue.
Using Time Stamp 1648109998 (0x623c29ae) for Data Signature (03:19:58) delaying 15 seconds . . . Commencing write performance test. Thu Mar 24 03:21:10 2022 write 717.93 1048576 1024.00 0.113480 91.17 0.010149 91.28 3 XXCEL Verifying contents of the file(s) just written. Thu Mar 24 03:22:41 2022 delaying 15 seconds . . . [RANK 000] open for reading file /lus/kjcf05/disk/ostest.vers/alsorun.20220324030303.12286.walleye-p5/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m.1.LKuc9T.1648109355/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m/IORfile_1m XXCEL Commencing read performance test. Thu Mar 24 03:23:27 2022 read 2698.93 1048576 1024.00 0.030882 24.25 0.005629 24.28 3 XXCEL Using Time Stamp 1648110232 (0x623c2a98) for Data Signature (03:24:42) delaying 15 seconds . . . (~03:24:57) Mar 24 03:24:51 kjcf05n03 kernel: Lustre: Failing over kjcf05-MDT0000 ** error ** ** error ** ADIO_RESOLVEFILETYPE_FNCALL(387): Invalid file name /lus/kjcf05/disk/ostest.vers/alsorun.20220324030303.12286.walleye-p5/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m.1.LKuc9T.1648109355/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m/IORfile_1m, mpi_check_status: 939600165, mpi_check_status_errno: 107 MPI File does not exist, error stack: (unknown)(): Invalid file name, mpi_check_status: 939600165, mpi_check_status_errno: 2 Rank 0 [Thu Mar 24 03:25:00 2022] [c3-0c0s12n0] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0 Mar 24 03:25:46 kjcf05n03 kernel: Lustre: server umount kjcf05-MDT0000 complete Mar 24 03:25:46 kjcf05n03 kernel: md65: detected capacity change from 21009999921152 to 0 Mar 24 03:25:46 kjcf05n03 kernel: md: md65 stopped. Mar 24 03:25:48 kjcf05n02 kernel: md: md65 stopped. 00000020:00000001:22.0:1648110350.625691:0:512728:0:(obd_mount_server.c:1352:server_start_targets()) Process entered Mar 24 03:25:51 kjcf05n02 kernel: Lustre: kjcf05-MDT0000: Will be in recovery for at least 15:00, or until 24 clients reconnect
The fail reason is the next mpich codepath
MPI_File_open()>ADIO_ResolveFileType()>ADIO_FileSysType_fncall()->statfs()
vfs statfs part do a lookup for a file and then ll_statfs. If cluster lost MDT between these to calls, ll_statfs ends with one of next error EAGAIN,ENOTCONN,ENODEV. The exact number depends on a MDT failover stage. The error brakes MPICH logic for detecting FS type, and fails the IOR. Error doesn't happen for nolazystatfs cause ll_statfs is blocking and waits MDT.
Lazystatfs was designed not to block statfs. However OST failover does not produce ll_statfs error cause statfs returns only MDT data and rc 0.
Also mpich has a workaround for ESTALE error from NFS
static void ADIO_FileSysType_fncall(const char *filename, int *fstype, int *error_code) { int err; int64_t file_id; static char myname[] = "ADIO_RESOLVEFILETYPE_FNCALL"; /* NFS can get stuck and end up returning ESTALE "forever" */ #define MAX_ESTALE_RETRY 10000 int retry_cnt; *error_code = MPI_SUCCESS; retry_cnt = 0; do { err = romio_statfs(filename, &file_id); } while (err && (errno == ESTALE) && retry_cnt++ < MAX_ESTALE_RETRY);
I'm suggesting to add error masking to ESTALE for ll_statfs. This will make MPICH happy with lazystatfs option with FOFB.
Attachments
Issue Links
- is duplicated by
-
LU-15457 IOR MPIIO job abort - file handling issue (EAGAIN)
- Closed