Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18429

failover node nid switches to loopback 0@lo on imports

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Blocker
    • None
    • Lustre 2.15.3
    • None
    • Centos 8 on Cloud compute instances
    • 2
    • 9223372036854775807

    Description

      The Lustre target fs-OST0000 has failover IP address of 10.99.100.152@tcp1. The way failover is implemented using floating IP which means that when the failover happens the OST volume moves to a new host along with its IP address.However, Lustre imports in local host where this IP is present, it is importing it from 0@lo. This makes the failover not work.

       

      Why this believed to be a bug or enhancement?  In cloud environments and in some on-premise environments, the failover of the volume is implemented using floating IP address and the IP address moves with the volume.The failover process for volume and IP address is pretty robust and very quick in these virtual environments. There should be an override option (if not already. I could not find a param or configuration option for this) in these environment that the import always have the IP nid on imports rather than automatically optimizing to 0@lo.

      Showing an example here:

       

      On OSS, the file system is mounted (OSS is acting as a Lutre client (in this example). However, the same situation happens with regular Lustre target communications as well. For example: in lustre server where MGT and MDT are on the same host, and it choose to talk via loop back between them. If MGT moves to another host, it breaks the failover as MDT will continue to talk on the loop back.

       

      
      [oss000 ~]$ df -h
      Filesystem                  Size  Used Avail Use% Mounted on
      devtmpfs                     30G     0   30G   0% /dev
      tmpfs                        30G  8.1M   30G   1% /dev/shm
      tmpfs                        30G   25M   30G   1% /run
      tmpfs                        30G     0   30G   0% /sys/fs/cgroup
      /dev/mapper/ocivolume-root   36G   17G   19G  48% /
      /dev/sdc2                  1014M  637M  378M  63% /boot
      /dev/mapper/ocivolume-oled   10G  2.5G  7.6G  25% /var/oled
      /dev/sdc1                   100M  5.1M   95M   6% /boot/efi
      tmpfs                       5.9G     0  5.9G   0% /run/user/987
      tmpfs                       5.9G     0  5.9G   0% /run/user/0
      /dev/sdb                     49G   28G   18G  62% /fs-OST0001
      /dev/sda                     49G   29G   17G  63% /fs-OST0000
      tmpfs                       5.9G     0  5.9G   0% /run/user/1000
      10.99.100.221@tcp1:/fs  193G   94G   90G  51% /mnt/fs
      
      [oss000 ~]$ sudo tunefs.lustre --dryrun /dev/sda
      checking for existing Lustre data: found   Read previous values:
      Target:     fs-OST0000
      Index:      0
      Lustre FS:  fs
      Mount type: ldiskfs
      Flags:      0x1002
                    (OST no_primnode )
      Persistent mount opts: ,errors=remount-ro
      Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1
         Permanent disk data:
      Target:     fs-OST0000
      Index:      0
      Lustre FS:  fs
      Mount type: ldiskfs
      Flags:      0x1002
                    (OST no_primnode )
      Persistent mount opts: ,errors=remount-ro
      Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1exiting before disk write.
      
      

       

      The import is showing 0@lo as the failover nid where as it should be  10.99.100.152@tcp1. Here 10.99.100.152@tcp1 existing on the same host as the import happening. So, it switches to loop back 0@lo.

       

      [oss000 proc]# cat /proc/fs/lustre/osc/fs-OST0000-osc-ffff89c57672e000/import
      import:
          name: fs-OST0000-osc-ffff89c57672e000
          target: fs-OST0000_UUID
          state: IDLE
          connect_flags: [ write_grant, server_lock, version, request_portal, max_byte_per_rpc, early_lock_cancel, adaptive_timeouts, lru_resize, alt_checksum_algorithm, fid_is_enabled, version_recovery, grant_shrink, full20, layout_lock, 64bithash, object_max_bytes, jobstats, einprogress, grant_param, lvb_type, short_io, lfsck, bulk_mbits, second_flags, lockaheadv2, increasing_xid, client_encryption, lseek, reply_mbits ]
          connect_data:
             flags: 0xa0425af2e3440078
             instance: 39
             target_version: 2.15.3.0
             initial_grant: 8437760
             max_brw_size: 4194304
             grant_block_size: 4096
             grant_inode_size: 32
             grant_max_extent_size: 67108864
             grant_extent_tax: 24576
             cksum_types: 0xf7
             max_object_bytes: 17592186040320
          import_flags: [ replayable, pingable, connect_tried ]
          connection:
             failover_nids: [ 0@lo, 0@lo ] <<<<<<< 
             current_connection: 0@lo
             connection_attempts: 1
             generation: 1
             in-progress_invalidations: 0
             idle: 36 sec
          rpcs:
             inflight: 0
             unregistering: 0
             timeouts: 0
             avg_waittime: 2627 usec
          service_estimates:
             services: 1 sec
             network: 1 sec
          transactions:
             last_replay: 0
             peer_committed: 0
             last_checked: 0 

      When the fs-OST0000 fails over, the volume moves to another host along with its IP address 10.99.100.152@tcp1. The IP address becomes active within sub-seconds. All other clients recovers pretty fast.

      The client side still keeps the 0@lo and never tries to talk to failover nid 10.99.100.152@tcp1. The "lfs df" command hangs on OST0000 and the following error is seen in log, and it never recovers. Failover never completes.

       

      [Wed Nov  6 19:02:36 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:03:53 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:04:08 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 19:04:08 2024] LustreError: Skipped 51 previous similar messages
      [Wed Nov  6 19:05:10 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:06:27 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:07:43 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:08:29 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 19:08:29 2024] LustreError: Skipped 101 previous similar messages
      [Wed Nov  6 19:09:00 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:10:17 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:12:51 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:12:51 2024] LustreError: Skipped 1 previous similar message
      [Wed Nov  6 19:17:07 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 19:17:07 2024] LustreError: Skipped 201 previous similar messages
      [Wed Nov  6 19:17:58 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:17:58 2024] LustreError: Skipped 3 previous similar messages
      [Wed Nov  6 19:26:55 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:26:55 2024] LustreError: Skipped 6 previous similar messages
      [Wed Nov  6 19:27:11 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 19:27:11 2024] LustreError: Skipped 235 previous similar messages
      [Wed Nov  6 19:37:10 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:37:10 2024] LustreError: Skipped 7 previous similar messages
      [Wed Nov  6 19:37:15 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 19:37:15 2024] LustreError: Skipped 235 previous similar messages
      [Wed Nov  6 19:47:19 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 19:47:19 2024] LustreError: Skipped 235 previous similar messages
      [Wed Nov  6 19:47:24 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:47:24 2024] LustreError: Skipped 7 previous similar messages
      [Wed Nov  6 19:57:23 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 19:57:23 2024] LustreError: Skipped 235 previous similar messages
      [Wed Nov  6 19:57:38 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 19:57:38 2024] LustreError: Skipped 7 previous similar messages
      [Wed Nov  6 20:07:26 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 20:07:26 2024] LustreError: Skipped 234 previous similar messages
      [Wed Nov  6 20:07:57 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 20:07:57 2024] LustreError: Skipped 7 previous similar messages
      [Wed Nov  6 20:17:30 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 20:17:30 2024] LustreError: Skipped 235 previous similar messages
      [Wed Nov  6 20:18:11 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
      [Wed Nov  6 20:18:11 2024] LustreError: Skipped 7 previous similar messages
      [Wed Nov  6 20:27:34 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
      [Wed Nov  6 20:27:34 2024] LustreError: Skipped 235 previous similar messages 

      Attachments

        Activity

          People

            wc-triage WC Triage
            abooback Vadakkumuri Valappil Abooback
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: