Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.12.9, Lustre 2.15.0, Lustre 2.15.1
-
None
-
master branch + VMs
2.12.9 + patches on a production cluster
(more than 1 MDT per node)
-
3
-
9223372036854775807
Description
After migrating the MGS/MDT0000 resources from node 1 to node 2. MGC still on node 1 is unable to connect to the combined MGS/MDT on node 2.
The issue here is that the combined MGS/MDT ignore failover nids for the MGS. When this target is mounted first, it creates a mgc without failover nids.
The others targets will reuse this mgc device (same name), without adding new failover nodes.
So, when the MDT/MGS target is mounted on node2, mgc on node 1 is not able to connect because of the missing failover nid.
e.g:
[root@mds1 ~]# tunefs.lustre --dryrun /dev/mapper/mds1_flakey
checking for existing Lustre data: found
Read previous values:
Target: lustrefs-MDT0000
Index: 0
Lustre FS: lustrefs
Mount type: ldiskfs
Flags: 0x5
(MDT MGS )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.0.2.4@tcp:10.0.2.7@tcp
[root@mds1 ~]# tunefs.lustre --dryrun /dev/mapper/mds2_flakey
checking for existing Lustre data: found
Read previous values:
Target: lustrefs-MDT0001
Index: 1
Lustre FS: lustrefs
Mount type: ldiskfs
Flags: 0x1
(MDT )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=10.0.2.4@tcp:10.0.2.7@tcp
Mount mds1_flakey and then mds2_flakey:
[root@mds1 ~]# mount -tlustre /dev/mapper/mds1_flakey /media/lustrefs/mds1
[root@mds1 ~]# mount -tlustre /dev/mapper/mds2_flakey /media/lustrefs/mds2
[root@mds1 ~]# lctl dl
0 UP osd-ldiskfs lustrefs-MDT0000-osd lustrefs-MDT0000-osd_UUID 8
1 UP mgs MGS MGS 4
2 UP mgc MGC10.0.2.4@tcp 9b8dda76-560c-449d-ad56-a81a673cd1aa 4
3 UP mds MDS MDS_uuid 2
...
[root@mds1 ~]# lctl get_param mgc.MGC10.0.2.4@tcp.import
mgc.MGC10.0.2.4@tcp.import=
import:
name: MGC10.0.2.4@tcp
target: MGS
state: FULL
connect_flags: [ version, barrier, adaptive_timeouts, full20, imp_recov, bulk_mbits, second_flags, reply_mbits ]
connect_data:
flags: 0xa000011001002020
instance: 0
target_version: 2.15.51.0
import_flags: [ pingable, connect_tried ]
connection:
failover_nids: [ 0@lo ] <----------------
current_connection: 0@lo
connection_attempts: 1
generation: 1
in-progress_invalidations: 0
idle: 78545 sec
Mount mds2_flakey and then mds1_flakey:
[root@mds1 ~]# lctl get_param mgc.MGC10.0.2.4@tcp.import
mgc.MGC10.0.2.4@tcp.import=
import:
name: MGC10.0.2.4@tcp
target: MGS
state: FULL
connect_flags: [ version, barrier, adaptive_timeouts, full20, imp_recov, bulk_mbits, second_flags, reply_mbits ]
connect_data:
flags: 0xa000011001002020
instance: 0
target_version: 2.15.51.0
import_flags: [ pingable, connect_tried ]
connection:
failover_nids: [ 0@lo, 10.0.2.7@tcp ] <----------------
current_connection: 0@lo
connection_attempts: 10
generation: 1
in-progress_invalidations: 0
idle: 60 sec