Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8397

take comma as separator of mgsnode's list

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      mgsnode's list of nids is allowed to use comma as separator (see 36.15.3 of Lustre Operations Manual), however lustre_start_mgc() assumes that only colon can separate nids. Due to that nids after comma do not get into list of import's connections and those nids do not get involved.

      Attachments

        Issue Links

          Activity

            [LU-8397] take comma as separator of mgsnode's list
            yujian Jian Yu added a comment -

            Patch landed to master branch for Lustre 2.10.0.

            yujian Jian Yu added a comment - Patch landed to master branch for Lustre 2.10.0.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21308/
            Subject: LU-8397 mgc: add comma-separated nids for primary MGS
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 672e182049e1a636d8591748764950f99c4b71d9

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/21308/ Subject: LU-8397 mgc: add comma-separated nids for primary MGS Project: fs/lustre-release Branch: master Current Patch Set: Commit: 672e182049e1a636d8591748764950f99c4b71d9
            yujian Jian Yu added a comment -

            Thank you very much, Darby. I'll push the patch to be ready to land.

            yujian Jian Yu added a comment - Thank you very much, Darby. I'll push the patch to be ready to land.
            dvicker Darby Vicker added a comment -

            I just tested 2.9.0 with this patch and it does indeed fix our issues.  I brought up our filesystem while configured on both TCP and IB, mounted both TCP and IB clients and failed over the MDS/MGS to the secondary.  The OST's and the clients all pick up the secondary MGS now.  Info from one of our OSS's is below - this is after the failover.  Thank you very much!  

             

            [root@hpfs-fsl-oss01 ~]# cat /proc/fs/lustre/mgc/MGC192.52.98.30@tcp/import 
            import:
                name: MGC192.52.98.30@tcp
                target: MGS
                state: FULL
                connect_flags: [ version, adaptive_timeouts, mds_mds_connection, full20, imp_recov, bulk_mbits ]
                connect_data:
                   flags: 0x2000011005000020
                   instance: 0
                   target_version: 2.9.0.0
                import_flags: [ pingable, connect_tried ]
                connection:
                   failover_nids: [ 192.52.98.31@tcp, 192.52.98.30@tcp ]
                   current_connection: 192.52.98.31@tcp
                   connection_attempts: 3
                   generation: 2
                   in-progress_invalidations: 0
            [root@hpfs-fsl-oss01 ~]# zfs get all | grep mgs
            oss01-0/ost-fsl             lustre:mgsnode        192.52.98.30@tcp,10.148.0.30@o2ib:192.52.98.31@tcp,10.148.0.31@o2ib  local
            [root@hpfs-fsl-oss01 ~]# 
            
            
            
            
            dvicker Darby Vicker added a comment - I just tested 2.9.0 with this patch and it does indeed fix our issues.  I brought up our filesystem while configured on both TCP and IB, mounted both TCP and IB clients and failed over the MDS/MGS to the secondary.  The OST's and the clients all pick up the secondary MGS now.  Info from one of our OSS's is below - this is after the failover.  Thank you very much!     [root@hpfs-fsl-oss01 ~]# cat /proc/fs/lustre/mgc/MGC192.52.98.30@tcp/import import: name: MGC192.52.98.30@tcp target: MGS state: FULL connect_flags: [ version, adaptive_timeouts, mds_mds_connection, full20, imp_recov, bulk_mbits ] connect_data: flags: 0x2000011005000020 instance: 0 target_version: 2.9.0.0 import_flags: [ pingable, connect_tried ] connection: failover_nids: [ 192.52.98.31@tcp, 192.52.98.30@tcp ] current_connection: 192.52.98.31@tcp connection_attempts: 3 generation: 2 in-progress_invalidations: 0 [root@hpfs-fsl-oss01 ~]# zfs get all | grep mgs oss01-0/ost-fsl lustre:mgsnode 192.52.98.30@tcp,10.148.0.30@o2ib:192.52.98.31@tcp,10.148.0.31@o2ib local [root@hpfs-fsl-oss01 ~]#
            yujian Jian Yu added a comment -

            Hi Darby,
            I just updated patch https://review.whamcloud.com/21308 on master branch to fix the issue in my previous comment. The patch can be applied to Lustre 2.9.0 release directly. Could you please try it? Thank you.

            yujian Jian Yu added a comment - Hi Darby, I just updated patch https://review.whamcloud.com/21308 on master branch to fix the issue in my previous comment. The patch can be applied to Lustre 2.9.0 release directly. Could you please try it? Thank you.

            I'm sorry - it turns out I was wrong when I said reverting this patch fixed our situation.  I think what happened is that I forgot to stop lnet before I uninstalled stock 2.9.0.  I then installed and tested the patched version of 2.9.0 - but the lnet kernel modules from stock 2.9.0 stayed loaded.  So I was using some weird combination of a patched and stock version of 2.9.0.  I've repeated the tests using a version of 2.9.0 with patch 7905 reverted and the situation is actually worse - with both single rail (tcp only) or dual rail (tcp and ib) I only see one entry for failover_nids in /proc/fs/lustre/mgc/MGC192.52.98.30@tcp/import.  I can repeat the tests and dump debug logs if that would help.  

            dvicker Darby Vicker added a comment - I'm sorry - it turns out I was wrong when I said reverting this patch fixed our situation.  I think what happened is that I forgot to stop lnet before I uninstalled stock 2.9.0.  I then installed and tested the patched version of 2.9.0 - but the lnet kernel modules from stock 2.9.0 stayed loaded.  So I was using some weird combination of a patched and stock version of 2.9.0.  I've repeated the tests using a version of 2.9.0 with patch 7905 reverted and the situation is actually worse - with both single rail (tcp only) or dual rail (tcp and ib) I only see one entry for failover_nids in /proc/fs/lustre/mgc/MGC192.52.98.30@tcp/import.  I can repeat the tests and dump debug logs if that would help.  
            yujian Jian Yu added a comment -

            Thank you very much Darby for the confirmation.

            yujian Jian Yu added a comment - Thank you very much Darby for the confirmation.
            dvicker Darby Vicker added a comment -

            This evening I compiled the 2.9.0 release with patch 7905 reverted.  I can confirm that this did allow me to failover our OST's properly.  Once the OST were mounted with the NIDS shown in the comment above we see both TCP NIDS in /proc/fs/lustre/mgc/MGC192.52.98.30@tcp/import and the MGC connects with the secondary MDS/MGS when we failover.  

            dvicker Darby Vicker added a comment - This evening I compiled the 2.9.0 release with patch 7905 reverted.  I can confirm that this did allow me to failover our OST's properly.  Once the OST were mounted with the NIDS shown in the comment above we see both TCP NIDS in /proc/fs/lustre/mgc/MGC192.52.98.30@tcp/import and the MGC connects with the secondary MDS/MGS when we failover.  
            yujian Jian Yu added a comment -

            Hi Bobi,

            I found the following codes in lustre_start_mgc() introduced by patch https://review.whamcloud.com/7509 for LU-3829 might have some issue:

                                    /*
                                     * LU-3829.
                                     * Here we only take the first mgsnid as its primary
                                     * serving mgs node, the rest mgsnid will be taken as
                                     * failover mgs node, otherwise they would be takens
                                     * as multiple nids of a single mgs node.
                                     */
                                    while (class_parse_nid(ptr, &nid, &ptr) == 0) {
                                            rc = do_lcfg(mgcname, nid, LCFG_ADD_UUID,
                                                         niduuid, NULL, NULL, NULL);
                                            if (rc == 0) {
                                                    i = 1;
                                                    break;
                                            }
                                    }
            
            

            For multiple NIDs separated by commas on one MGS node, the above codes would only add the first NID and then break the while loop. After lustre_start_simple() is performed, the remaining comma-separated MGS NID(s) would not be added because the following condition doesn't work:

                    /* Add any failover MGS nids */
                    i = 1;
                    while (ptr && ((*ptr == ':' ||
                           class_find_param(ptr, PARAM_MGSNODE, &ptr) == 0))) {
            
            

            ptr is previously assigned with ptr = lsi->lsi_lmd->lmd_mgs, and for comma-separated MGS NIDs, ptr now points to a ','. There is also no mgsnode= in the string, so the above condition does not work. All of the NIDs after the first comma would not be added.

            It seems this is the root cause of the latest issue reported by Darby in LU-8311. The ptr has a value of 192.52.98.30@tcp,10.148.0.30@o2ib:192.52.98.31@tcp,10.148.0.31@o2ib, but from the debug log in debug.log.ib_and_eth, I can only see the first NID 192.52.98.30@tcp was added.

            yujian Jian Yu added a comment - Hi Bobi, I found the following codes in lustre_start_mgc() introduced by patch https://review.whamcloud.com/7509 for LU-3829 might have some issue: /* * LU-3829. * Here we only take the first mgsnid as its primary * serving mgs node, the rest mgsnid will be taken as * failover mgs node, otherwise they would be takens * as multiple nids of a single mgs node. */ while (class_parse_nid(ptr, &nid, &ptr) == 0) { rc = do_lcfg(mgcname, nid, LCFG_ADD_UUID, niduuid, NULL, NULL, NULL); if (rc == 0) { i = 1; break ; } } For multiple NIDs separated by commas on one MGS node, the above codes would only add the first NID and then break the while loop. After lustre_start_simple() is performed, the remaining comma-separated MGS NID(s) would not be added because the following condition doesn't work: /* Add any failover MGS nids */ i = 1; while (ptr && ((*ptr == ':' || class_find_param(ptr, PARAM_MGSNODE, &ptr) == 0))) { ptr is previously assigned with ptr = lsi->lsi_lmd->lmd_mgs , and for comma-separated MGS NIDs, ptr now points to a ','. There is also no mgsnode= in the string, so the above condition does not work. All of the NIDs after the first comma would not be added. It seems this is the root cause of the latest issue reported by Darby in LU-8311 . The ptr has a value of 192.52.98.30@tcp,10.148.0.30@o2ib:192.52.98.31@tcp,10.148.0.31@o2ib , but from the debug log in debug.log.ib_and_eth, I can only see the first NID 192.52.98.30@tcp was added.
            pjones Peter Jones added a comment -

            Emoly

            Could you please take care of reviews etc for this patch?

            Thanks

            Peter

            pjones Peter Jones added a comment - Emoly Could you please take care of reviews etc for this patch? Thanks Peter

            People

              emoly.liu Emoly Liu
              vsaveliev Vladimir Saveliev
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: