[LU-4334] With ZFS can only declare a single mgsnode for MDT or OST Created: 02/Dec/13  Updated: 03/Oct/16  Resolved: 11/Sep/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.1, Lustre 2.6.0, Lustre 2.5.2
Fix Version/s: Lustre 2.7.0, Lustre 2.5.4

Type: Bug Priority: Critical
Reporter: Eric Kolb Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: llnl, prz, zfs
Environment:

SL6.4


Issue Links:
Related
is related to LU-4749 ZFS-backed OST mkfs.lustre --service... Resolved
is related to LU-8311 Target does not mount with the new mg... Resolved
Epic/Theme: zfs
Severity: 1
Rank (Obsolete): 11857

 Description   

When trying to declare two mgsnodes for HA only one entry is ever accepted when using backfstype=zfs. All below examples fail, if the --mgsnode=nid:nid syntax worked then all might be okay.

mkfs.lustre --reformat --fsname=RSF1 --ost --index=2 --mgsnode=10.82.0.9@tcp1 --mgsnode=10.82.0.9@tcp1:10.82.0.10@tcp1 --servicenode=10.82.0.11@tcp1:10.82.0.12@tcp1 --backfstype=zfs OST2/ost

mkfs.lustre --reformat --fsname=RSF1 --ost --index=2 --mgsnode=10.82.0.9@tcp1:10.82.0.10@tcp1 --servicenode=10.82.0.11@tcp1:10.82.0.12@tcp1 --backfstype=zfs OST2/ost



 Comments   
Comment by Eric Kolb [ 02/Dec/13 ]

Perhaps the below sequence displays the issue more clearly.

$ tunefs.lustre OST2/ost
checking for existing Lustre data: found

Read previous values:
Target: RSF1-OST0002
Index: 2
Lustre FS: RSF1
Mount type: zfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts:
Parameters: failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1 mgsnode=10.82.0.9@tcp1

Permanent disk data:
Target: RSF1-OST0002
Index: 2
Lustre FS: RSF1
Mount type: zfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts:
Parameters: failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1 mgsnode=10.82.0.9@tcp1

Writing OST2/ost properties
lustre:version=1
lustre:flags=4098
lustre:index=2
lustre:fsname=RSF1
lustre:svname=RSF1-OST0002
lustre:failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1
lustre:mgsnode=10.82.0.9@tcp1

$ tunefs.lustre --mgsnode=10.82.0.9@tcp1 --mgsnode=10.82.0.10@tcp1 OST2/ost
checking for existing Lustre data: found

Read previous values:
Target: RSF1-OST0002
Index: 2
Lustre FS: RSF1
Mount type: zfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts:
Parameters: failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1 mgsnode=10.82.0.9@tcp1

Permanent disk data:
Target: RSF1-OST0002
Index: 2
Lustre FS: RSF1
Mount type: zfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts:
Parameters: failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1 mgsnode=10.82.0.9@tcp1 mgsnode=10.82.0.10@tcp1

Writing OST2/ost properties
lustre:version=1
lustre:flags=4098
lustre:index=2
lustre:fsname=RSF1
lustre:svname=RSF1-OST0002
lustre:failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1
lustre:mgsnode=10.82.0.9@tcp1
lustre:mgsnode=10.82.0.10@tcp1

$ tunefs.lustre OST2/ost
checking for existing Lustre data: found

Read previous values:
Target: RSF1-OST0002
Index: 2
Lustre FS: RSF1
Mount type: zfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts:
Parameters: failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1 mgsnode=10.82.0.10@tcp1

Permanent disk data:
Target: RSF1-OST0002
Index: 2
Lustre FS: RSF1
Mount type: zfs
Flags: 0x1002
(OST no_primnode )
Persistent mount opts:
Parameters: failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1 mgsnode=10.82.0.10@tcp1

Writing OST2/ost properties
lustre:version=1
lustre:flags=4098
lustre:index=2
lustre:fsname=RSF1
lustre:svname=RSF1-OST0002
lustre:failover.node=10.82.0.11@tcp1:10.82.0.12@tcp1
lustre:mgsnode=10.82.0.10@tcp1

Comment by JS Landry [ 20/Dec/13 ]

Hi, this syntax works. --mgsnode=node1:node2

  1. tunefs.lustre lustre1-ost4/ost0
    checking for existing Lustre data: found

Read previous values:
Target: lustre1-OST0004
Index: 4
Lustre FS: lustre1
Mount type: zfs
Flags: 0x1022
(OST first_time no_primnode )
Persistent mount opts:
Parameters: mgsnode=10.225.8.3@o2ib failover.node=10.225.4.4@o2ib

Permanent disk data:
Target: lustre1:OST0004
Index: 4
Lustre FS: lustre1
Mount type: zfs
Flags: 0x1022
(OST first_time no_primnode )
Persistent mount opts:
Parameters: mgsnode=10.225.8.3@o2ib failover.node=10.225.4.4@o2ib

Writing lustre1-ost4/ost0 properties
lustre:version=1
lustre:flags=4130
lustre:index=4
lustre:fsname=lustre1
lustre:svname=lustre1:OST0004
lustre:mgsnode=10.225.8.3@o2ib
lustre:failover.node=10.225.4.4@o2ib

  1. tunefs.lustre --mgsnode=mds1-225@o2ib:mds2-225@o2ib lustre1-ost4/ost0
    checking for existing Lustre data: found

Read previous values:
Target: lustre1-OST0004
Index: 4
Lustre FS: lustre1
Mount type: zfs
Flags: 0x1022
(OST first_time no_primnode )
Persistent mount opts:
Parameters: mgsnode=10.225.8.3@o2ib failover.node=10.225.4.4@o2ib

Permanent disk data:
Target: lustre1:OST0004
Index: 4
Lustre FS: lustre1
Mount type: zfs
Flags: 0x1022
(OST first_time no_primnode )
Persistent mount opts:
Parameters: mgsnode=10.225.8.3@o2ib failover.node=10.225.4.4@o2ib mgsnode=10.225.8.2@o2ib:10.225.8.3@o2ib

Writing lustre1-ost4/ost0 properties
lustre:version=1
lustre:flags=4130
lustre:index=4
lustre:fsname=lustre1
lustre:svname=lustre1:OST0004
lustre:mgsnode=10.225.8.3@o2ib
lustre:failover.node=10.225.4.4@o2ib
lustre:mgsnode=10.225.8.2@o2ib:10.225.8.3@o2ib

  1. tunefs.lustre lustre1-ost4/ost0
    checking for existing Lustre data: found

Read previous values:
Target: lustre1-OST0004
Index: 4
Lustre FS: lustre1
Mount type: zfs
Flags: 0x1022
(OST first_time no_primnode )
Persistent mount opts:
Parameters: mgsnode=10.225.8.2@o2ib:10.225.8.3@o2ib failover.node=10.225.4.4@o2ib

Permanent disk data:
Target: lustre1:OST0004
Index: 4
Lustre FS: lustre1
Mount type: zfs
Flags: 0x1022
(OST first_time no_primnode )
Persistent mount opts:
Parameters: mgsnode=10.225.8.2@o2ib:10.225.8.3@o2ib failover.node=10.225.4.4@o2ib

Writing lustre1-ost4/ost0 properties
lustre:version=1
lustre:flags=4130
lustre:index=4
lustre:fsname=lustre1
lustre:svname=lustre1:OST0004
lustre:mgsnode=10.225.8.2@o2ib:10.225.8.3@o2ib
lustre:failover.node=10.225.4.4@o2ib

Comment by Eric Kolb [ 23/Dec/13 ]

Hello,

Yes the --mgsnode=nid:nid setting can be applied to the MDTs and OSTs but the fail-over does not occur. The Lustre components seem only to use the fist nid in the specified list and upon fail-over of the MGS they will not use the second nid specified.

Comment by Nathaniel Clark [ 25/Apr/14 ]

It looks like there are two separate issues here:

1) listing failover or mgsnode in the form --mgsnode=NID1 --mgsnode=NID2 or --mgsnode=NID1,NID2 will result in only NID2 being recorded.

This seems to be due to how metadata is stored on zfs, that property names are unique, thus setting it twice will just overwrite the first with the second.

2) listing nids in the form --mgsnode=NID1:NID2 will result in only NID1 being used

Comment by Christopher Morrone [ 06/Jun/14 ]

As part of fixing this issue, we need to make certain that the relevant OSD documentation is updated to clearly define the APIs and expecations for the OSD developer.

Comment by Nathaniel Clark [ 16/Jul/14 ]

It seems like the right idea would be to store NID information in a single ZFS property <server1ip1>@tcp,<server1ip2>@tcp:<server2ip1>@tcp,<server2ip2>@tcp similar to how it can be input on the command line.

This will apply to mgsnode, failnode, and servicenode.

Comment by Nathaniel Clark [ 21/Jul/14 ]

This will fix setting multiple mgsnode properties on ZFS.
http://review.whamcloud.com/11161

Workaround for older systems:

Instead of

tunefs.lustre --mgsnode=192.168.139.10@tcp --mgsnode=192.168.139.70@tcp mdt/mdt1

Use the following:

zfs set lustre:mgsnode=192.168.139.10@tcp:192.168.139.70@tcp mdt/mdt1 
Comment by Peter Jones [ 11/Sep/14 ]

Landed for 2.7

Comment by Chris Gearing (Inactive) [ 13/Oct/14 ]

Does

zfs set lustre:mgsnode=192.168.139.10@tcp:192.168.139.70@tcp mdt/mdt1 

imply that ':' is the standard separator for zfs properties, for example in the LU-4749 does failnode be split the same way.

Generated at Sat Feb 10 01:41:47 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.