[LU-18429] failover node nid switches to loopback 0@lo on imports - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Not a Bug
Priority: Blocker
Fix Version/s: None
Affects Version/s: Lustre 2.15.3
Labels:
None
Environment:
Centos 8 on Cloud compute instances

Epic/Theme:
- lnet
Severity:
2
Rank (Obsolete):
9223372036854775807

Description

The Lustre target fs-OST0000 has failover IP address of 10.99.100.152@tcp1. The way failover is implemented using floating IP which means that when the failover happens the OST volume moves to a new host along with its IP address.However, Lustre imports in local host where this IP is present, it is importing it from 0@lo. This makes the failover not work.

Why this believed to be a bug or enhancement? In cloud environments and in some on-premise environments, the failover of the volume is implemented using floating IP address and the IP address moves with the volume.The failover process for volume and IP address is pretty robust and very quick in these virtual environments. There should be an override option (if not already. I could not find a param or configuration option for this) in these environment that the import always have the IP nid on imports rather than automatically optimizing to 0@lo.

Showing an example here:

On OSS, the file system is mounted (OSS is acting as a Lutre client (in this example). However, the same situation happens with regular Lustre target communications as well. For example: in lustre server where MGT and MDT are on the same host, and it choose to talk via loop back between them. If MGT moves to another host, it breaks the failover as MDT will continue to talk on the loop back.

[oss000 ~]$ df -h
Filesystem                  Size  Used Avail Use% Mounted on
devtmpfs                     30G     0   30G   0% /dev
tmpfs                        30G  8.1M   30G   1% /dev/shm
tmpfs                        30G   25M   30G   1% /run
tmpfs                        30G     0   30G   0% /sys/fs/cgroup
/dev/mapper/ocivolume-root   36G   17G   19G  48% /
/dev/sdc2                  1014M  637M  378M  63% /boot
/dev/mapper/ocivolume-oled   10G  2.5G  7.6G  25% /var/oled
/dev/sdc1                   100M  5.1M   95M   6% /boot/efi
tmpfs                       5.9G     0  5.9G   0% /run/user/987
tmpfs                       5.9G     0  5.9G   0% /run/user/0
/dev/sdb                     49G   28G   18G  62% /fs-OST0001
/dev/sda                     49G   29G   17G  63% /fs-OST0000
tmpfs                       5.9G     0  5.9G   0% /run/user/1000
10.99.100.221@tcp1:/fs  193G   94G   90G  51% /mnt/fs

[oss000 ~]$ sudo tunefs.lustre --dryrun /dev/sda
checking for existing Lustre data: found   Read previous values:
Target:     fs-OST0000
Index:      0
Lustre FS:  fs
Mount type: ldiskfs
Flags:      0x1002
              (OST no_primnode )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1
   Permanent disk data:
Target:     fs-OST0000
Index:      0
Lustre FS:  fs
Mount type: ldiskfs
Flags:      0x1002
              (OST no_primnode )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1exiting before disk write.

The import is showing 0@lo as the failover nid where as it should be 10.99.100.152@tcp1. Here 10.99.100.152@tcp1 existing on the same host as the import happening. So, it switches to loop back 0@lo.

[oss000 proc]# cat /proc/fs/lustre/osc/fs-OST0000-osc-ffff89c57672e000/import
import:
    name: fs-OST0000-osc-ffff89c57672e000
    target: fs-OST0000_UUID
    state: IDLE
    connect_flags: [ write_grant, server_lock, version, request_portal, max_byte_per_rpc, early_lock_cancel, adaptive_timeouts, lru_resize, alt_checksum_algorithm, fid_is_enabled, version_recovery, grant_shrink, full20, layout_lock, 64bithash, object_max_bytes, jobstats, einprogress, grant_param, lvb_type, short_io, lfsck, bulk_mbits, second_flags, lockaheadv2, increasing_xid, client_encryption, lseek, reply_mbits ]
    connect_data:
       flags: 0xa0425af2e3440078
       instance: 39
       target_version: 2.15.3.0
       initial_grant: 8437760
       max_brw_size: 4194304
       grant_block_size: 4096
       grant_inode_size: 32
       grant_max_extent_size: 67108864
       grant_extent_tax: 24576
       cksum_types: 0xf7
       max_object_bytes: 17592186040320
    import_flags: [ replayable, pingable, connect_tried ]
    connection:
       failover_nids: [ 0@lo, 0@lo ] <<<<<<< 
       current_connection: 0@lo
       connection_attempts: 1
       generation: 1
       in-progress_invalidations: 0
       idle: 36 sec
    rpcs:
       inflight: 0
       unregistering: 0
       timeouts: 0
       avg_waittime: 2627 usec
    service_estimates:
       services: 1 sec
       network: 1 sec
    transactions:
       last_replay: 0
       peer_committed: 0
       last_checked: 0

When the fs-OST0000 fails over, the volume moves to another host along with its IP address 10.99.100.152@tcp1. The IP address becomes active within sub-seconds. All other clients recovers pretty fast.

The client side still keeps the 0@lo and never tries to talk to failover nid 10.99.100.152@tcp1. The "lfs df" command hangs on OST0000 and the following error is seen in log, and it never recovers. Failover never completes.

[Wed Nov  6 19:02:36 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:03:53 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:04:08 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 19:04:08 2024] LustreError: Skipped 51 previous similar messages
[Wed Nov  6 19:05:10 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:06:27 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:07:43 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:08:29 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 19:08:29 2024] LustreError: Skipped 101 previous similar messages
[Wed Nov  6 19:09:00 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:10:17 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:12:51 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:12:51 2024] LustreError: Skipped 1 previous similar message
[Wed Nov  6 19:17:07 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 19:17:07 2024] LustreError: Skipped 201 previous similar messages
[Wed Nov  6 19:17:58 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:17:58 2024] LustreError: Skipped 3 previous similar messages
[Wed Nov  6 19:26:55 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:26:55 2024] LustreError: Skipped 6 previous similar messages
[Wed Nov  6 19:27:11 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 19:27:11 2024] LustreError: Skipped 235 previous similar messages
[Wed Nov  6 19:37:10 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:37:10 2024] LustreError: Skipped 7 previous similar messages
[Wed Nov  6 19:37:15 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 19:37:15 2024] LustreError: Skipped 235 previous similar messages
[Wed Nov  6 19:47:19 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 19:47:19 2024] LustreError: Skipped 235 previous similar messages
[Wed Nov  6 19:47:24 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:47:24 2024] LustreError: Skipped 7 previous similar messages
[Wed Nov  6 19:57:23 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 19:57:23 2024] LustreError: Skipped 235 previous similar messages
[Wed Nov  6 19:57:38 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 19:57:38 2024] LustreError: Skipped 7 previous similar messages
[Wed Nov  6 20:07:26 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 20:07:26 2024] LustreError: Skipped 234 previous similar messages
[Wed Nov  6 20:07:57 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 20:07:57 2024] LustreError: Skipped 7 previous similar messages
[Wed Nov  6 20:17:30 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 20:17:30 2024] LustreError: Skipped 235 previous similar messages
[Wed Nov  6 20:18:11 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19
[Wed Nov  6 20:18:11 2024] LustreError: Skipped 7 previous similar messages
[Wed Nov  6 20:27:34 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server.
[Wed Nov  6 20:27:34 2024] LustreError: Skipped 235 previous similar messages

failover node nid switches to loopback 0@lo on imports

Details

Description

Attachments

Activity

People

Dates