[LU-16204] Connections from MGC to a Combined MGS/MDT on failover node not working Created: 04/Oct/22 Updated: 04/Oct/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.9, Lustre 2.15.0, Lustre 2.15.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Etienne Aujames | Assignee: | Etienne Aujames |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
master branch + VMs |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
After migrating the MGS/MDT0000 resources from node 1 to node 2. MGC still on node 1 is unable to connect to the combined MGS/MDT on node 2. The issue here is that the combined MGS/MDT ignore failover nids for the MGS. When this target is mounted first, it creates a mgc without failover nids. e.g: [root@mds1 ~]# tunefs.lustre --dryrun /dev/mapper/mds1_flakey
checking for existing Lustre data: found
Read previous values:
Target: lustrefs-MDT0000
Index: 0
Lustre FS: lustrefs
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.0.2.4@tcp:10.0.2.7@tcp
[root@mds1 ~]# tunefs.lustre --dryrun /dev/mapper/mds2_flakey
checking for existing Lustre data: found
Read previous values:
Target: lustrefs-MDT0001
Index: 1
Lustre FS: lustrefs
Mount type: ldiskfs
Flags: 0x1
(MDT )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.0.2.4@tcp:10.0.2.7@tcp
Mount mds1_flakey and then mds2_flakey: [root@mds1 ~]# mount -tlustre /dev/mapper/mds1_flakey /media/lustrefs/mds1
[root@mds1 ~]# mount -tlustre /dev/mapper/mds2_flakey /media/lustrefs/mds2
[root@mds1 ~]# lctl dl
0 UP osd-ldiskfs lustrefs-MDT0000-osd lustrefs-MDT0000-osd_UUID 8
1 UP mgs MGS MGS 4
2 UP mgc MGC10.0.2.4@tcp 9b8dda76-560c-449d-ad56-a81a673cd1aa 4
3 UP mds MDS MDS_uuid 2
...
[root@mds1 ~]# lctl get_param mgc.MGC10.0.2.4@tcp.import
mgc.MGC10.0.2.4@tcp.import=
import:
name: MGC10.0.2.4@tcp
target: MGS
state: FULL
connect_flags: [ version, barrier, adaptive_timeouts, full20, imp_recov, bulk_mbits, second_flags, reply_mbits ]
connect_data:
flags: 0xa000011001002020
instance: 0
target_version: 2.15.51.0
import_flags: [ pingable, connect_tried ]
connection:
failover_nids: [ 0@lo ] <----------------
current_connection: 0@lo
connection_attempts: 1
generation: 1
in-progress_invalidations: 0
idle: 78545 sec
Mount mds2_flakey and then mds1_flakey: [root@mds1 ~]# lctl get_param mgc.MGC10.0.2.4@tcp.import
mgc.MGC10.0.2.4@tcp.import=
import:
name: MGC10.0.2.4@tcp
target: MGS
state: FULL
connect_flags: [ version, barrier, adaptive_timeouts, full20, imp_recov, bulk_mbits, second_flags, reply_mbits ]
connect_data:
flags: 0xa000011001002020
instance: 0
target_version: 2.15.51.0
import_flags: [ pingable, connect_tried ]
connection:
failover_nids: [ 0@lo, 10.0.2.7@tcp ] <----------------
current_connection: 0@lo
connection_attempts: 10
generation: 1
in-progress_invalidations: 0
idle: 60 sec
|