[LU-15788] lazystatfs + FOFB + mpich problems - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.16.0, Lustre 2.15.0
Affects Version/s: Lustre 2.15.0
Labels:
- patch

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

During FOFB tests with IOR and mpich we observing next errors. I've created a timeline for a issue.

Using Time Stamp 1648109998 (0x623c29ae) for Data Signature  (03:19:58)
delaying 15 seconds . . .
 Commencing write performance test.
 Thu Mar 24 03:21:10 2022

 write     717.93     1048576    1024.00    0.113480   91.17      0.010149   91.28      3    XXCEL
 Verifying contents of the file(s) just written.
 Thu Mar 24 03:22:41 2022

 delaying 15 seconds . . .
 [RANK 000] open for reading file /lus/kjcf05/disk/ostest.vers/alsorun.20220324030303.12286.walleye-p5/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m.1.LKuc9T.1648109355/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m/IORfile_1m XXCEL
 Commencing read performance test.
 Thu Mar 24 03:23:27 2022

 read      2698.93    1048576    1024.00    0.030882   24.25      0.005629   24.28      3    XXCEL
 Using Time Stamp 1648110232 (0x623c2a98) for Data Signature (03:24:42)
 delaying 15 seconds . . . (~03:24:57)

Mar 24 03:24:51 kjcf05n03 kernel: Lustre: Failing over kjcf05-MDT0000

 ** error **
 ** error **
 ADIO_RESOLVEFILETYPE_FNCALL(387): Invalid file name /lus/kjcf05/disk/ostest.vers/alsorun.20220324030303.12286.walleye-p5/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m.1.LKuc9T.1648109355/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m/IORfile_1m, mpi_check_status: 939600165, mpi_check_status_errno: 107
 MPI File does not exist, error stack:
 (unknown)(): Invalid file name, mpi_check_status: 939600165, mpi_check_status_errno: 2

Rank 0 [Thu Mar 24 03:25:00 2022] [c3-0c0s12n0] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0


Mar 24 03:25:46 kjcf05n03 kernel: Lustre: server umount kjcf05-MDT0000 complete
Mar 24 03:25:46 kjcf05n03 kernel: md65: detected capacity change from 21009999921152 to 0
Mar 24 03:25:46 kjcf05n03 kernel: md: md65 stopped.
Mar 24 03:25:48 kjcf05n02 kernel: md: md65 stopped.
00000020:00000001:22.0:1648110350.625691:0:512728:0:(obd_mount_server.c:1352:server_start_targets()) Process entered
Mar 24 03:25:51 kjcf05n02 kernel: Lustre: kjcf05-MDT0000: Will be in recovery for at least 15:00, or until 24 clients reconnect

The fail reason is the next mpich codepath
MPI_File_open()~~>ADIO_ResolveFileType()~~>ADIO_FileSysType_fncall()->statfs()

vfs statfs part do a lookup for a file and then ll_statfs. If cluster lost MDT between these to calls, ll_statfs ends with one of next error EAGAIN,ENOTCONN,ENODEV. The exact number depends on a MDT failover stage. The error brakes MPICH logic for detecting FS type, and fails the IOR. Error doesn't happen for nolazystatfs cause ll_statfs is blocking and waits MDT.
Lazystatfs was designed not to block statfs. However OST failover does not produce ll_statfs error cause statfs returns only MDT data and rc 0.
Also mpich has a workaround for ESTALE error from NFS

static void ADIO_FileSysType_fncall(const char *filename, int *fstype, int *error_code)
{
    int err;
    int64_t file_id;
    static char myname[] = "ADIO_RESOLVEFILETYPE_FNCALL";


/* NFS can get stuck and end up returning ESTALE "forever" */
#define MAX_ESTALE_RETRY 10000
    int retry_cnt;

    *error_code = MPI_SUCCESS;

    retry_cnt = 0;
    do {
        err = romio_statfs(filename, &file_id);
    } while (err && (errno == ESTALE) && retry_cnt++ < MAX_ESTALE_RETRY);

I'm suggesting to add error masking to ESTALE for ll_statfs. This will make MPICH happy with lazystatfs option with FOFB.

Attachments

Issue Links

is duplicated by

LU-15457 IOR MPIIO job abort - file handling issue (EAGAIN)

Closed

Activity

People

Assignee:: Alexander Boyko

Reporter:: Alexander Boyko

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 27/Apr/22 8:35 AM

Updated:: 11/Oct/24 4:46 PM

Resolved:: 11/Jun/22 3:19 PM