Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1716

Race in setting connection flags and using them on 2.x client connect

Details

    • 3
    • 4475

    Description

      Lustre 2.1 client fails to connect to Lustre 2.2 server

      > c0-0c2s6n1 LustreError: 11-0: an error occurred while communicating with 10.149.3.5@o2ib. The mgs_config_read operation failed with -524
      > c0-0c2s6n1 LustreError: 4645:0:(mgc_request.c:1917:mgc_process_config()) Cannot process recover llog -524
      > c0-0c2s6n1 LustreError: 15c-8: MGC10.149.3.5@o2ib: The configuration from log 'snxs2-client' failed (-524). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
      > c0-0c2s6n1 LustreError: 4645:0:(llite_lib.c:983:ll_fill_super()) Unable to process log: -524

      the race can be reproduced with following patch:

      diff --git a/lustre/ptlrpc/import.c b/lustre/ptlrpc/import.c
      index 2953352..a69e6b9 100644
      — a/lustre/ptlrpc/import.c
      +++ b/lustre/ptlrpc/import.c
      @@ -805,6 +805,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
                       } else {
                               IMPORT_SET_STATE(imp, LUSTRE_IMP_FULL);
                               ptlrpc_activate_import(imp);
      +                        OBD_FAIL_TIMEOUT(0x5555, 2);
                       }
       
                       GOTO(finish, rc = 0);
      

      Attachments

        Activity

          [LU-1716] Race in setting connection flags and using them on 2.x client connect
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.1.4 [ 10158 ]
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.3.0 [ 10117 ]
          Fix Version/s New: Lustre 2.4.0 [ 10154 ]
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          green Oleg Drokin made changes -
          Summary Original: 2.1 client fails to mount 2.2 filesystem New: Race in setting connection flags and using them on 2.x client connect
          pjones Peter Jones made changes -
          Assignee Original: WC Triage [ wc-triage ] New: Bob Glossman [ bogl ]
          jlevi Jodi Levi (Inactive) made changes -
          Affects Version/s New: Lustre 2.3.0 [ 10117 ]
          Priority Original: Minor [ 4 ] New: Blocker [ 1 ]
          adilger Andreas Dilger made changes -
          Description Original: Lustre 2.1 client fails to connect to Lustre 2.2 server

          > c0-0c2s6n1 LustreError: 11-0: an error occurred while communicating with 10.149.3.5@o2ib. The mgs_config_read operation failed with -524
          > c0-0c2s6n1 LustreError: 4645:0:(mgc_request.c:1917:mgc_process_config()) Cannot process recover llog -524
          > c0-0c2s6n1 LustreError: 15c-8: MGC10.149.3.5@o2ib: The configuration from log 'snxs2-client' failed (-524). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
          > c0-0c2s6n1 LustreError: 4645:0:(llite_lib.c:983:ll_fill_super()) Unable to process log: -524

          the race can be reproduced with following patch:

          diff --git a/lustre/ptlrpc/import.c b/lustre/ptlrpc/import.c
          index 2953352..a69e6b9 100644
          — a/lustre/ptlrpc/import.c
          +++ b/lustre/ptlrpc/import.c
          @@ -805,6 +805,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
          } else { IMPORT_SET_STATE(imp, LUSTRE_IMP_FULL); ptlrpc_activate_import(imp); + OBD_FAIL_TIMEOUT(0x5555, 2); }

          GOTO(finish, rc = 0);
          New: Lustre 2.1 client fails to connect to Lustre 2.2 server

          > c0-0c2s6n1 LustreError: 11-0: an error occurred while communicating with 10.149.3.5@o2ib. The mgs_config_read operation failed with -524
          > c0-0c2s6n1 LustreError: 4645:0:(mgc_request.c:1917:mgc_process_config()) Cannot process recover llog -524
          > c0-0c2s6n1 LustreError: 15c-8: MGC10.149.3.5@o2ib: The configuration from log 'snxs2-client' failed (-524). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information.
          > c0-0c2s6n1 LustreError: 4645:0:(llite_lib.c:983:ll_fill_super()) Unable to process log: -524

          the race can be reproduced with following patch:

          {noformat}
          diff --git a/lustre/ptlrpc/import.c b/lustre/ptlrpc/import.c
          index 2953352..a69e6b9 100644
          — a/lustre/ptlrpc/import.c
          +++ b/lustre/ptlrpc/import.c
          @@ -805,6 +805,7 @@ static int ptlrpc_connect_interpret(const struct lu_env *env,
                           } else {
                                   IMPORT_SET_STATE(imp, LUSTRE_IMP_FULL);
                                   ptlrpc_activate_import(imp);
          + OBD_FAIL_TIMEOUT(0x5555, 2);
                           }
           
                           GOTO(finish, rc = 0);
          {noformat}
          askulysh Andriy Skulysh created issue -

          People

            bogl Bob Glossman (Inactive)
            askulysh Andriy Skulysh
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: