[LU-14108] Mounting targets created with mkfs "network" option should disable discovery Created: 02/Nov/20  Updated: 30/May/23  Resolved: 30/May/23

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Serguei Smirnov Assignee: Cyril Bordage
Resolution: Fixed Votes: 0
Labels: lnet

Issue Links:
Related
is related to LU-11057 Client mount option "-o network=net" ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

The --network= option to mkfs.lustre allows restricting a target (OST/MDT) to a given LNet network. This makes it register to the MGS with the specified network only. However, dynamic discovery is unaware of this restriction and this can create problems.

If this scenario is recognised, it is possible to deal with it in at least two ways:

1)  Prevent the mount with an error.

2) Disable dynamic discovery with a warning, prevent from enabling discovery in the future.



 Comments   
Comment by Serguei Smirnov [ 02/Nov/20 ]

To reproduce, use --network option when formatting targets with mkfs.lustre. The following is an example from the customer setup:

"

The error is occurring between MDS (md1) and OSS (all OSS showing similar errors). For example, in the case of rcf2-OST000d, it is being mounted on os2, but md1 is trying to connect it to os1 repeatedly. Usually rcf2-OST000d is mounted on os2 (primary OSS for this OST) and error should not happen.

tunefs.lustre shows network=o2ib1, and primary is os2 (172.20.2.26@o2ib1) and secondary is os1 (172.20.2.25@o2ib1)

"

[root@os2 ~]# tunefs.lustre --dryrun /dev/mapper/OST0d
checking for existing Lustre data: found

   Read previous values:
Target:     rcf2-OST000d
Index:      13
Lustre FS:  rcf2
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro
Parameters:  mgsnode=172.20.2.11@o2ib:172.20.2.12@o2ib failover.node=172.20.2.25@o2ib1 network=o2ib1


   Permanent disk data:
Target:     rcf2-OST000d
Index:      13
Lustre FS:  rcf2
Mount type: ldiskfs
Flags:      0x2
              (OST )
Persistent mount opts: errors=remount-ro
Parameters:  mgsnode=172.20.2.11@o2ib:172.20.2.12@o2ib failover.node=172.20.2.25@o2ib1 network=o2ib1

exiting before disk write.

It should be possible to reproduce with a simpler setup. 

The symptom is the error message similar to the following:

Oct 22 11:58:27 os1-s kernel: Lustre: rcf2-OST0001: Received new MDS connection from 172.20.2.11@o2ib, removing former export from same NID
Oct 22 11:58:27 os1-s kernel: Lustre: Skipped 47 previous similar messages
 ...
Oct 22 12:02:13 os1-s kernel: LustreError: 137-5: rcf2-OST000d_UUID: not available for connect from 172.20.2.11@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
Oct 22 12:02:13 os1-s kernel: LustreError: Skipped 47 previous similar messages

To see how the targets are registered to the MGS, run command similar to the following on the MGS:

lctl --device MGS llog_print rcf2-MDT0000

This should show that only NIDs on the network specified as the option with mkfs were used.

Comment by Gerrit Updater [ 26/Feb/22 ]

"Cyril Bordage <cbordage@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/46632
Subject: LU-14108 mount: prevent if --network and discovery
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f5f0dba9c253a25ba9b6d06a0b4e361aa48690d5

Comment by Gerrit Updater [ 25/Apr/22 ]

"Cyril Bordage <cbordage@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47137
Subject: LU-14108 tests: mount with discovery and network
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 038db7e1872e6ef743921744fb09f0f0c201124a

Comment by Gerrit Updater [ 04/Oct/22 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46632/
Subject: LU-14108 mount: prevent if --network and discovery
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e82836f56ee7a9337a86ad0a32f19751024c7ec6

Generated at Sat Feb 10 03:06:54 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.