Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6307

Interop 2.6.0<->2.7 recovery-small test_105: MGS refused the connection from different version MDT

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.7.0, Lustre 2.8.0
    • Lustre 2.7.0
    • None
    • 3
    • 17662

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/e154ad20-bfc4-11e4-881f-5254006e85c2.

      The sub-test test_105 failed with the following error:

      mount failed
      

      MDS

      Lustre: DEBUG MARKER: == recovery-small test 105: IR: NON IR clients support == 20:22:45 (1425154965)
      Lustre: DEBUG MARKER: /usr/sbin/lctl list_param mgs.*.ir_timeout
      Lustre: DEBUG MARKER: lctl set_param -n mgs.MGS.live.lustre=state=full
      Lustre: MGS (2.7.0.0) refused the connection from different version MDT (2.6.0.0) 10.1.4.150@tcp 17c0ca90-6e3b-8642-1b7b-a19358c36e81
      Lustre: MGS (2.7.0.0) refused the connection from different version MDT (2.6.0.0) 10.1.4.150@tcp 17c0ca90-6e3b-8642-1b7b-a19358c36e81
      Lustre: MGS (2.7.0.0) refused the connection from different version MDT (2.6.0.0) 10.1.4.150@tcp 17c0ca90-6e3b-8642-1b7b-a19358c36e81
      Lustre: MGS (2.7.0.0) refused the connection from different version MDT (2.6.0.0) 10.1.4.150@tcp 17c0ca90-6e3b-8642-1b7b-a19358c36e81
      Lustre: DEBUG MARKER: /usr/sbin/lctl mark  recovery-small test_105: @@@@@@ FAIL: mount failed 
      

      Attachments

        Issue Links

          Activity

            [LU-6307] Interop 2.6.0<->2.7 recovery-small test_105: MGS refused the connection from different version MDT
            pjones Peter Jones added a comment -

            Landed for 2.7 and 2.8

            pjones Peter Jones added a comment - Landed for 2.7 and 2.8

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13927/
            Subject: LU-6307 obdclass: distinguish MGC/MDT connection properly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 1bdc4fd0594e8948623bde18e7b5d691104d8808

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13927/ Subject: LU-6307 obdclass: distinguish MGC/MDT connection properly Project: fs/lustre-release Branch: master Current Patch Set: Commit: 1bdc4fd0594e8948623bde18e7b5d691104d8808

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13928/
            Subject: LU-6307 obdclass: distinguish MGC/MDT connection properly
            Project: fs/lustre-release
            Branch: b2_7
            Current Patch Set:
            Commit: 0964ac7523ce816413b309f5b65a988713f607d0

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13928/ Subject: LU-6307 obdclass: distinguish MGC/MDT connection properly Project: fs/lustre-release Branch: b2_7 Current Patch Set: Commit: 0964ac7523ce816413b309f5b65a988713f607d0

            Thanks Sarah!

            yong.fan nasf (Inactive) added a comment - Thanks Sarah!
            sarah Sarah Liu added a comment -

            FanYong, just let you know the patch works

            sarah Sarah Liu added a comment - FanYong, just let you know the patch works

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13928
            Subject: LU-6307 obdclass: distinguish MGC/MDT connection properly
            Project: fs/lustre-release
            Branch: b2_7
            Current Patch Set: 1
            Commit: 0aa708733ace578813bef17eda1a745e3dedad27

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13928 Subject: LU-6307 obdclass: distinguish MGC/MDT connection properly Project: fs/lustre-release Branch: b2_7 Current Patch Set: 1 Commit: 0aa708733ace578813bef17eda1a745e3dedad27

            Sarah, would you please to verify above patch with b2_6 client for recovery-small test_105? Thanks!

            yong.fan nasf (Inactive) added a comment - Sarah, would you please to verify above patch with b2_6 client for recovery-small test_105? Thanks!

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13927
            Subject: LU-6307 obdclass: distinguish MGC/MDT connection properly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 96913037db05898a29cc3cfbf21dffb5914606f0

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13927 Subject: LU-6307 obdclass: distinguish MGC/MDT connection properly Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 96913037db05898a29cc3cfbf21dffb5914606f0
            yong.fan nasf (Inactive) added a comment - - edited

            It is not because of the check between MDS and MGS, instead, it is because we reuse the flag "OBD_CONNECT_MDS_MDS" as "OBD_CONNECT_MNE_SWAB". For the connection from MGC, it will set OBD_CONNECT_MNE_SWAB as following:

            /* The MNE_SWAB flag is overloading the MDS_MDS bit only for the MGS
             * connection.  It is a temporary bug fix for Imperative Recovery interop
             * between 2.2 and 2.3 x86/ppc nodes, and can be removed when interop for
             * 2.2 clients/servers is no longer needed.  LU-1252/LU-1644. */
            #define OBD_CONNECT_MNE_SWAB             OBD_CONNECT_MDS_MDS
            ...
            lustre_start_mgc(...)
            {
                ...
            #if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
                    data->ocd_connect_flags |= OBD_CONNECT_MNE_SWAB;
            #endif
            
                    if (lmd_is_client(lsi->lsi_lmd) &&
                        lsi->lsi_lmd->lmd_flags & LMD_FLG_NOIR)
                            data->ocd_connect_flags &= ~OBD_CONNECT_IMP_RECOV;
                ...
            }
            

            Generally, on the server side, we can distinguish whether it is from MGC or not by checking OBD_CONNECT_IMP_RECOV. But in the recovery-small test_105, because of "noir" is specified, the OBD_CONNECT_IMP_RECOV is dropped. So the server cannot know whether it is from MGC or old MDT. Then failed.

            yong.fan nasf (Inactive) added a comment - - edited It is not because of the check between MDS and MGS, instead, it is because we reuse the flag "OBD_CONNECT_MDS_MDS" as "OBD_CONNECT_MNE_SWAB". For the connection from MGC, it will set OBD_CONNECT_MNE_SWAB as following: /* The MNE_SWAB flag is overloading the MDS_MDS bit only for the MGS * connection. It is a temporary bug fix for Imperative Recovery interop * between 2.2 and 2.3 x86/ppc nodes, and can be removed when interop for * 2.2 clients/servers is no longer needed. LU-1252/LU-1644. */ #define OBD_CONNECT_MNE_SWAB OBD_CONNECT_MDS_MDS ... lustre_start_mgc(...) { ... #if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0) data->ocd_connect_flags |= OBD_CONNECT_MNE_SWAB; #endif if (lmd_is_client(lsi->lsi_lmd) && lsi->lsi_lmd->lmd_flags & LMD_FLG_NOIR) data->ocd_connect_flags &= ~OBD_CONNECT_IMP_RECOV; ... } Generally, on the server side, we can distinguish whether it is from MGC or not by checking OBD_CONNECT_IMP_RECOV. But in the recovery-small test_105, because of "noir" is specified, the OBD_CONNECT_IMP_RECOV is dropped. So the server cannot know whether it is from MGC or old MDT. Then failed.

            This is caused by http://review.whamcloud.com/13285 but it shouldn't check the versions between MDS and MGS, only between MDS nodes.

            adilger Andreas Dilger added a comment - This is caused by http://review.whamcloud.com/13285 but it shouldn't check the versions between MDS and MGS, only between MDS nodes.

            People

              yong.fan nasf (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: