To reproduce, use --network option when formatting targets with mkfs.lustre. The following is an example from the customer setup:
"
The error is occurring between MDS (md1) and OSS (all OSS showing similar errors). For example, in the case of rcf2-OST000d, it is being mounted on os2, but md1 is trying to connect it to os1 repeatedly. Usually rcf2-OST000d is mounted on os2 (primary OSS for this OST) and error should not happen.
tunefs.lustre shows network=o2ib1, and primary is os2 (172.20.2.26@o2ib1) and secondary is os1 (172.20.2.25@o2ib1)
"
[root@os2 ~]# tunefs.lustre --dryrun /dev/mapper/OST0d
checking for existing Lustre data: found
Read previous values:
Target: rcf2-OST000d
Index: 13
Lustre FS: rcf2
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=172.20.2.11@o2ib:172.20.2.12@o2ib failover.node=172.20.2.25@o2ib1 network=o2ib1
Permanent disk data:
Target: rcf2-OST000d
Index: 13
Lustre FS: rcf2
Mount type: ldiskfs
Flags: 0x2
(OST )
Persistent mount opts: errors=remount-ro
Parameters: mgsnode=172.20.2.11@o2ib:172.20.2.12@o2ib failover.node=172.20.2.25@o2ib1 network=o2ib1
exiting before disk write.
It should be possible to reproduce with a simpler setup.
The symptom is the error message similar to the following:
Oct 22 11:58:27 os1-s kernel: Lustre: rcf2-OST0001: Received new MDS connection from 172.20.2.11@o2ib, removing former export from same NID
Oct 22 11:58:27 os1-s kernel: Lustre: Skipped 47 previous similar messages
...
Oct 22 12:02:13 os1-s kernel: LustreError: 137-5: rcf2-OST000d_UUID: not available for connect from 172.20.2.11@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server.
Oct 22 12:02:13 os1-s kernel: LustreError: Skipped 47 previous similar messages
To see how the targets are registered to the MGS, run command similar to the following on the MGS:
lctl --device MGS llog_print rcf2-MDT0000
This should show that only NIDs on the network specified as the option with mkfs were used.
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/46632/
Subject:
LU-14108mount: prevent if --network and discoveryProject: fs/lustre-release
Branch: master
Current Patch Set:
Commit: e82836f56ee7a9337a86ad0a32f19751024c7ec6