Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15788

lazystatfs + FOFB + mpich problems

    XMLWordPrintable

Details

    • 3
    • 9223372036854775807

    Description

      During FOFB tests with IOR and mpich we observing next errors. I've created a timeline for a issue.

      Using Time Stamp 1648109998 (0x623c29ae) for Data Signature  (03:19:58)
      delaying 15 seconds . . .
       Commencing write performance test.
       Thu Mar 24 03:21:10 2022
      
       write     717.93     1048576    1024.00    0.113480   91.17      0.010149   91.28      3    XXCEL
       Verifying contents of the file(s) just written.
       Thu Mar 24 03:22:41 2022
      
       delaying 15 seconds . . .
       [RANK 000] open for reading file /lus/kjcf05/disk/ostest.vers/alsorun.20220324030303.12286.walleye-p5/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m.1.LKuc9T.1648109355/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m/IORfile_1m XXCEL
       Commencing read performance test.
       Thu Mar 24 03:23:27 2022
      
       read      2698.93    1048576    1024.00    0.030882   24.25      0.005629   24.28      3    XXCEL
       Using Time Stamp 1648110232 (0x623c2a98) for Data Signature (03:24:42)
       delaying 15 seconds . . . (~03:24:57)
      
      Mar 24 03:24:51 kjcf05n03 kernel: Lustre: Failing over kjcf05-MDT0000
      
       ** error **
       ** error **
       ADIO_RESOLVEFILETYPE_FNCALL(387): Invalid file name /lus/kjcf05/disk/ostest.vers/alsorun.20220324030303.12286.walleye-p5/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m.1.LKuc9T.1648109355/CL_IOR_sel_32ovs_mpiio_wr_8iter_n64_1m/IORfile_1m, mpi_check_status: 939600165, mpi_check_status_errno: 107
       MPI File does not exist, error stack:
       (unknown)(): Invalid file name, mpi_check_status: 939600165, mpi_check_status_errno: 2
      
      Rank 0 [Thu Mar 24 03:25:00 2022] [c3-0c0s12n0] application called MPI_Abort(MPI_COMM_WORLD, -1) - process 0
      
      
      Mar 24 03:25:46 kjcf05n03 kernel: Lustre: server umount kjcf05-MDT0000 complete
      Mar 24 03:25:46 kjcf05n03 kernel: md65: detected capacity change from 21009999921152 to 0
      Mar 24 03:25:46 kjcf05n03 kernel: md: md65 stopped.
      Mar 24 03:25:48 kjcf05n02 kernel: md: md65 stopped.
      00000020:00000001:22.0:1648110350.625691:0:512728:0:(obd_mount_server.c:1352:server_start_targets()) Process entered
      Mar 24 03:25:51 kjcf05n02 kernel: Lustre: kjcf05-MDT0000: Will be in recovery for at least 15:00, or until 24 clients reconnect
      

      The fail reason is the next mpich codepath
      MPI_File_open()>ADIO_ResolveFileType()>ADIO_FileSysType_fncall()->statfs()

      vfs statfs part do a lookup for a file and then ll_statfs. If cluster lost MDT between these to calls, ll_statfs ends with one of next error EAGAIN,ENOTCONN,ENODEV. The exact number depends on a MDT failover stage. The error brakes MPICH logic for detecting FS type, and fails the IOR. Error doesn't happen for nolazystatfs cause ll_statfs is blocking and waits MDT.
      Lazystatfs was designed not to block statfs. However OST failover does not produce ll_statfs error cause statfs returns only MDT data and rc 0.
      Also mpich has a workaround for ESTALE error from NFS

      static void ADIO_FileSysType_fncall(const char *filename, int *fstype, int *error_code)
      {
          int err;
          int64_t file_id;
          static char myname[] = "ADIO_RESOLVEFILETYPE_FNCALL";
      
      
      /* NFS can get stuck and end up returning ESTALE "forever" */
      #define MAX_ESTALE_RETRY 10000
          int retry_cnt;
      
          *error_code = MPI_SUCCESS;
      
          retry_cnt = 0;
          do {
              err = romio_statfs(filename, &file_id);
          } while (err && (errno == ESTALE) && retry_cnt++ < MAX_ESTALE_RETRY);
      

      I'm suggesting to add error masking to ESTALE for ll_statfs. This will make MPICH happy with lazystatfs option with FOFB.

      Attachments

        Issue Links

          Activity

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: