Details
-
Bug
-
Resolution: Unresolved
-
Blocker
-
None
-
Lustre 2.15.3
-
None
-
Centos 8 on Cloud compute instances
-
2
-
9223372036854775807
Description
The Lustre target fs-OST0000 has failover IP address of 10.99.100.152@tcp1. The way failover is implemented using floating IP which means that when the failover happens the OST volume moves to a new host along with its IP address.However, Lustre imports in local host where this IP is present, it is importing it from 0@lo. This makes the failover not work.
Why this believed to be a bug or enhancement? In cloud environments and in some on-premise environments, the failover of the volume is implemented using floating IP address and the IP address moves with the volume.The failover process for volume and IP address is pretty robust and very quick in these virtual environments. There should be an override option (if not already. I could not find a param or configuration option for this) in these environment that the import always have the IP nid on imports rather than automatically optimizing to 0@lo.
Showing an example here:
On OSS, the file system is mounted (OSS is acting as a Lutre client (in this example). However, the same situation happens with regular Lustre target communications as well. For example: in lustre server where MGT and MDT are on the same host, and it choose to talk via loop back between them. If MGT moves to another host, it breaks the failover as MDT will continue to talk on the loop back.
[oss000 ~]$ df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 30G 0 30G 0% /dev tmpfs 30G 8.1M 30G 1% /dev/shm tmpfs 30G 25M 30G 1% /run tmpfs 30G 0 30G 0% /sys/fs/cgroup /dev/mapper/ocivolume-root 36G 17G 19G 48% / /dev/sdc2 1014M 637M 378M 63% /boot /dev/mapper/ocivolume-oled 10G 2.5G 7.6G 25% /var/oled /dev/sdc1 100M 5.1M 95M 6% /boot/efi tmpfs 5.9G 0 5.9G 0% /run/user/987 tmpfs 5.9G 0 5.9G 0% /run/user/0 /dev/sdb 49G 28G 18G 62% /fs-OST0001 /dev/sda 49G 29G 17G 63% /fs-OST0000 tmpfs 5.9G 0 5.9G 0% /run/user/1000 10.99.100.221@tcp1:/fs 193G 94G 90G 51% /mnt/fs [oss000 ~]$ sudo tunefs.lustre --dryrun /dev/sda checking for existing Lustre data: found Read previous values: Target: fs-OST0000 Index: 0 Lustre FS: fs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1 Permanent disk data: Target: fs-OST0000 Index: 0 Lustre FS: fs Mount type: ldiskfs Flags: 0x1002 (OST no_primnode ) Persistent mount opts: ,errors=remount-ro Parameters: mgsnode=10.99.100.221@tcp1 failover.node=10.99.100.152@tcp1,10.99.100.152@tcp1exiting before disk write.
The import is showing 0@lo as the failover nid where as it should be 10.99.100.152@tcp1. Here 10.99.100.152@tcp1 existing on the same host as the import happening. So, it switches to loop back 0@lo.
[oss000 proc]# cat /proc/fs/lustre/osc/fs-OST0000-osc-ffff89c57672e000/import import: name: fs-OST0000-osc-ffff89c57672e000 target: fs-OST0000_UUID state: IDLE connect_flags: [ write_grant, server_lock, version, request_portal, max_byte_per_rpc, early_lock_cancel, adaptive_timeouts, lru_resize, alt_checksum_algorithm, fid_is_enabled, version_recovery, grant_shrink, full20, layout_lock, 64bithash, object_max_bytes, jobstats, einprogress, grant_param, lvb_type, short_io, lfsck, bulk_mbits, second_flags, lockaheadv2, increasing_xid, client_encryption, lseek, reply_mbits ] connect_data: flags: 0xa0425af2e3440078 instance: 39 target_version: 2.15.3.0 initial_grant: 8437760 max_brw_size: 4194304 grant_block_size: 4096 grant_inode_size: 32 grant_max_extent_size: 67108864 grant_extent_tax: 24576 cksum_types: 0xf7 max_object_bytes: 17592186040320 import_flags: [ replayable, pingable, connect_tried ] connection: failover_nids: [ 0@lo, 0@lo ] <<<<<<< current_connection: 0@lo connection_attempts: 1 generation: 1 in-progress_invalidations: 0 idle: 36 sec rpcs: inflight: 0 unregistering: 0 timeouts: 0 avg_waittime: 2627 usec service_estimates: services: 1 sec network: 1 sec transactions: last_replay: 0 peer_committed: 0 last_checked: 0
When the fs-OST0000 fails over, the volume moves to another host along with its IP address 10.99.100.152@tcp1. The IP address becomes active within sub-seconds. All other clients recovers pretty fast.
The client side still keeps the 0@lo and never tries to talk to failover nid 10.99.100.152@tcp1. The "lfs df" command hangs on OST0000 and the following error is seen in log, and it never recovers. Failover never completes.
[Wed Nov 6 19:02:36 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:03:53 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:04:08 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 19:04:08 2024] LustreError: Skipped 51 previous similar messages [Wed Nov 6 19:05:10 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:06:27 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:07:43 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:08:29 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 19:08:29 2024] LustreError: Skipped 101 previous similar messages [Wed Nov 6 19:09:00 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:10:17 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:12:51 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:12:51 2024] LustreError: Skipped 1 previous similar message [Wed Nov 6 19:17:07 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 19:17:07 2024] LustreError: Skipped 201 previous similar messages [Wed Nov 6 19:17:58 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:17:58 2024] LustreError: Skipped 3 previous similar messages [Wed Nov 6 19:26:55 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:26:55 2024] LustreError: Skipped 6 previous similar messages [Wed Nov 6 19:27:11 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 19:27:11 2024] LustreError: Skipped 235 previous similar messages [Wed Nov 6 19:37:10 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:37:10 2024] LustreError: Skipped 7 previous similar messages [Wed Nov 6 19:37:15 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 19:37:15 2024] LustreError: Skipped 235 previous similar messages [Wed Nov 6 19:47:19 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 19:47:19 2024] LustreError: Skipped 235 previous similar messages [Wed Nov 6 19:47:24 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:47:24 2024] LustreError: Skipped 7 previous similar messages [Wed Nov 6 19:57:23 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 19:57:23 2024] LustreError: Skipped 235 previous similar messages [Wed Nov 6 19:57:38 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 19:57:38 2024] LustreError: Skipped 7 previous similar messages [Wed Nov 6 20:07:26 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 20:07:26 2024] LustreError: Skipped 234 previous similar messages [Wed Nov 6 20:07:57 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 20:07:57 2024] LustreError: Skipped 7 previous similar messages [Wed Nov 6 20:17:30 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 20:17:30 2024] LustreError: Skipped 235 previous similar messages [Wed Nov 6 20:18:11 2024] LustreError: 11-0: fs-OST0000-osc-ffff89c43d273800: operation ost_connect to node 0@lo failed: rc = -19 [Wed Nov 6 20:18:11 2024] LustreError: Skipped 7 previous similar messages [Wed Nov 6 20:27:34 2024] LustreError: 137-5: fs-OST0000_UUID: not available for connect from 0@lo (no target). If you are running an HA pair check that the target is mounted on the other server. [Wed Nov 6 20:27:34 2024] LustreError: Skipped 235 previous similar messages