Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16204

Connections from MGC to a Combined MGS/MDT on failover node not working

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.12.9, Lustre 2.15.0, Lustre 2.15.1
    • None
    • master branch + VMs
      2.12.9 + patches on a production cluster
      (more than 1 MDT per node)
    • 3
    • 9223372036854775807

    Description

      After migrating the MGS/MDT0000 resources from node 1 to node 2. MGC still on node 1 is unable to connect to the combined MGS/MDT on node 2.

      The issue here is that the combined MGS/MDT ignore failover nids for the MGS. When this target is mounted first, it creates a mgc without failover nids.
      The others targets will reuse this mgc device (same name), without adding new failover nodes.
      So, when the MDT/MGS target is mounted on node2, mgc on node 1 is not able to connect because of the missing failover nid.

      e.g:

      [root@mds1 ~]# tunefs.lustre --dryrun /dev/mapper/mds1_flakey
      checking for existing Lustre data: found
      
         Read previous values:
      Target:     lustrefs-MDT0000
      Index:      0
      Lustre FS:  lustrefs
      Mount type: ldiskfs
      Flags:      0x5
                    (MDT MGS )
      Persistent mount opts: user_xattr,errors=remount-ro
      Parameters: mgsnode=10.0.2.4@tcp:10.0.2.7@tcp
      
      [root@mds1 ~]# tunefs.lustre --dryrun /dev/mapper/mds2_flakey       
      checking for existing Lustre data: found
      
         Read previous values:
      Target:     lustrefs-MDT0001
      Index:      1
      Lustre FS:  lustrefs
      Mount type: ldiskfs
      Flags:      0x1
                    (MDT )
      Persistent mount opts: user_xattr,errors=remount-ro
      Parameters: mgsnode=10.0.2.4@tcp:10.0.2.7@tcp
      

      Mount mds1_flakey and then mds2_flakey:

      [root@mds1 ~]# mount -tlustre /dev/mapper/mds1_flakey /media/lustrefs/mds1
      [root@mds1 ~]# mount -tlustre /dev/mapper/mds2_flakey /media/lustrefs/mds2
      [root@mds1 ~]# lctl dl
        0 UP osd-ldiskfs lustrefs-MDT0000-osd lustrefs-MDT0000-osd_UUID 8
        1 UP mgs MGS MGS 4
        2 UP mgc MGC10.0.2.4@tcp 9b8dda76-560c-449d-ad56-a81a673cd1aa 4
        3 UP mds MDS MDS_uuid 2
      ...
      [root@mds1 ~]# lctl get_param mgc.MGC10.0.2.4@tcp.import
      mgc.MGC10.0.2.4@tcp.import=
      import:
          name: MGC10.0.2.4@tcp
          target: MGS
          state: FULL
          connect_flags: [ version, barrier, adaptive_timeouts, full20, imp_recov, bulk_mbits, second_flags, reply_mbits ]
          connect_data:
             flags: 0xa000011001002020
             instance: 0
             target_version: 2.15.51.0
          import_flags: [ pingable, connect_tried ]
          connection:
             failover_nids: [ 0@lo ]                                        <----------------
             current_connection: 0@lo
             connection_attempts: 1
             generation: 1
             in-progress_invalidations: 0
             idle: 78545 sec
      

      Mount mds2_flakey and then mds1_flakey:

      [root@mds1 ~]# lctl get_param mgc.MGC10.0.2.4@tcp.import                                                           
      mgc.MGC10.0.2.4@tcp.import=
      import:
          name: MGC10.0.2.4@tcp
          target: MGS
          state: FULL
          connect_flags: [ version, barrier, adaptive_timeouts, full20, imp_recov, bulk_mbits, second_flags, reply_mbits ]
          connect_data:
             flags: 0xa000011001002020
             instance: 0
             target_version: 2.15.51.0
          import_flags: [ pingable, connect_tried ]
          connection:
             failover_nids: [ 0@lo, 10.0.2.7@tcp ]              <----------------
             current_connection: 0@lo
             connection_attempts: 10
             generation: 1
             in-progress_invalidations: 0
             idle: 60 sec
      

      Attachments

        Activity

          People

            eaujames Etienne Aujames
            eaujames Etienne Aujames
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: