[LU-3829] MDT mount fails if mkfs.lustre is run with multiple mgsnode arguments on MDSs where MGS is not running Created: 23/Aug/13 Updated: 26/Jan/17 Resolved: 02/Oct/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0, Lustre 2.5.0 |
| Fix Version/s: | Lustre 2.5.0, Lustre 2.4.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Kalpak Shah (Inactive) | Assignee: | Zhenyu Xu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 2 | ||||||||
| Rank (Obsolete): | 9893 | ||||||||
| Description |
|
If multiple --mgsnode arguments are provided to mkfs.lustre while formatting an MDT, then the mount of this MDT fails on the MDS where the MGS is not running. Reproduction Steps: mgs_pri_nid='10.10.11.210@tcp1' mkfs.lustre --mgs --reformat $mgs_dev mount -t lustre $mgs_dev /lustre/mgs/ So the MGS and MDT0 will be mounted on MDS0. Step 2.1) On MDS1: mgs_pri_nid='10.10.11.210@tcp1' mkfs.lustre --mgsnode=$mgs_pri_nid --mgsnode=$mgs_sec_nid --failnode=$mgs_pri_nid --reformat --fsname=v --mdt --index=1 $mdt1_dev # Does not mount. mount -t lustre $mdt1_dev /lustre/v/mdt1 The mount of MDT1 will fail with the following error: These are messages from Lustre logs while trying to mount MDT1: Step 2.2) On MDS1: mgs_pri_nid='10.10.11.210@tcp1' mkfs.lustre --mgsnode=$mgs_pri_nid --failnode=$mgs_pri_nid --reformat --fsname=v --mdt --index=1 $mdt1_dev mount -t lustre $mdt1_dev /lustre/v/mdt1 With this MDT1 will mount successfully. The only difference is that second "--mgsnode" is not provided during mkfs.lustre. Step 3: On MDS1 again: Once MDT1 is mounted, then using a second "--mgsnode" option works without any errors and mount of MDT2 succeeds. Lustre Versions: Reproducible on 2.4.0 and 2.4.91 versions. Conclusion: Due to this bug, MDTs do not mount on MDSs that are not running the MGS. With the workaround, HA will not be properly configured. |
| Comments |
| Comment by Swapnil Pimpale (Inactive) [ 23/Aug/13 ] |
|
In the above case, while mounting an MDT on MDS1 one of the mgsnode is MDS1 itself. It looks like ptlrpc_uuid_to_peer() calculates the distance to NIDs using LNetDist() and chooses the one with the least distance. (which in this case turns out to be MDS1 itself which does not have a running MGS) Removing MDS1 from mgsnode and adding a different node worked for me. |
| Comment by Peter Jones [ 28/Aug/13 ] |
|
Bobijam Could you please comment on this one? Thanks Peter |
| Comment by Zhenyu Xu [ 29/Aug/13 ] |
|
in Step 2.1
so this is MDS1 (which has 10.10.11.211) trying to connect MGS service on MDS0 (10.10.11.210), is it? |
| Comment by Swapnil Pimpale (Inactive) [ 29/Aug/13 ] |
|
Yes. But as you can see while trying to connect it has incorrectly calculated the peer to be 0@lo (import.c:480:import_select_connection()) MGC10.10.11.210@tcp: connect to NID 10.10.11.210@tcp last attempt 0 But if two mgsnodes are given (MDS0 and MDS1 in this case) we see the following message (import.c:480:import_select_connection()) MGC10.10.11.210@tcp: connect to NID 0@lo last attempt 0 |
| Comment by Zhenyu Xu [ 29/Aug/13 ] |
|
It's a complex configuration which IIRC manual does not mention (separate MGS and MDS while specifying multiple mgsnode for MDT device). The problem of it is: When you specify multiple mgsnode for MDT, it will add (remote_uuid, remote_nid) pairs, in this case MDS1 has (MGC10.10.11.210@tcp1, 10.10.11.210@tcp1) and (MGC10.10.11.210@tcp1, 10.10.11.211@tcp1). When mgc tries to start upon this MDT, it tries to find correct peer nid (aka. remote nid) for the MGC10.10.11.210, and it has two choices, it definitely always choose 10.10.11.211@tcp1 as its peer nid (its source nid is always 10.10.11.211@tcp1 or 0@lo), since it thought remote nid 10.10.11.211@tcp1 has shortest distance (0), and try to connect MGS service on 10.10.11.211@tcp1, and it could not succeed. When the import retries, it repeat above procedure and fails out eventually. |
| Comment by Ashley Pittman (Inactive) [ 29/Aug/13 ] |
|
This is a normal configuration for running two filesystems on a active/active pair of MDS nodes, in addition running DNE with a single filesystem and two MDTs on a active/active pair of MDS nodes is affected as well. Ideally we'd locate the MGS on entirely separate nodes to allow IR however that requires additional hardware and is a relatively expensive solution so whilst doing that would avoid this problem it's not a universal solution. |
| Comment by Zhenyu Xu [ 30/Aug/13 ] |
|
Would you please try this patch http://review.whamcloud.com/7509 ? The idea is to change import connection setting, and it does not always set the closest connection as its import connection, it adds all possible connections as the import connection, giving import a chance to try other connections. |
| Comment by Ashley Pittman (Inactive) [ 30/Aug/13 ] |
|
I've applied this patch to the 2.4.0 tag on b2_4 and can confirm that initial checks are that MDTs are starting correctly now. We are now seeing a different issue, I am unable to start a client on a node which is already running as a MDT/OSS: The log file is here: This doesn't look directly related to the changeset however we weren't seeing it before and this changeset should be the only change to the build. I'll keep investigating this second problem, let me know if you want me to file it as a separate issue. |
| Comment by Kalpak Shah (Inactive) [ 30/Aug/13 ] |
|
The failure to load lov module definitely looks like an unrelated error. |
| Comment by Ashley Pittman (Inactive) [ 04/Sep/13 ] |
|
We found a issue with our build system, whilst I was picking up this patch I also picked up some other changes which is where the LOV issue came from. This patch fixes the issue for us in all cases we've found, would it be possible to have it in 2.4.1? |
| Comment by Zhenyu Xu [ 05/Sep/13 ] |
|
patch tracking for b2_4 (http://review.whamcloud.com/7553) |
| Comment by Kalpak Shah (Inactive) [ 19/Sep/13 ] |
|
This is with 2.4.1 + patchsets from http://review.whamcloud.com/7553. With patchset 2, we are seeing an issue whereby remote directory operations across MDSes fail. health_check on these MDS reports something like: And it remains in this unhealthy state. Clients are unable to perform remote directory operations at all. With patchset 1, we do not see this issue. |
| Comment by James Beal [ 24/Sep/13 ] |
|
Another data point, running on a pure lustre 2.4.1 system with a file system that we have upgraded from lustre 1.8. We have the MGS and MDT on separate nodes. In summary at least for this filesystem the MDT has to be mounted at least once collocated with the MGS after a writeconf ( including the upgrade from lustre 1.8 to lustre 2.4.1). Initially the MDT was failing to mount with the following messages in the kernel log. Sep 24 08:31:56 lus03-mds2 kernel: Lustre: Lustre: Build Version: v2_4_1_0--CHANGED-2.6.32-jb23-358.18.1.el6-lustre-2.4.1 I chatted to Sven at DDN and in the first instance we tried to fix the " rename obd type from mds to mdt" by a tunefs.lustre --writeconf setting "mdd.quota_type=ug", this did not fix the issue ( however it is worth noting that I didn't use erase-config, so that I ended up repeating the writeconf later ), We then looked at Sep 24 09:51:48 lus03-mds2 kernel: Lustre: Lustre: Build Version: v2_4_1_0--CHANGED-2.6.32-jb23-358.18.1.el6-lustre-2.4.1 We noted that the writeconf had append the params so I repeated the process with erase-conf and tried to remount the MDT on its native node which failed, I remounted the MDT collocated with the MGS and it succeeded, after this I unmounted the MDT and it mounted successfully on the MDTs native node. |
| Comment by Zhenyu Xu [ 26/Sep/13 ] |
|
Hi Kalpak, Can you collect the lustre debug log of the remote directory operation failure? thanks. |
| Comment by Peter Jones [ 02/Oct/13 ] |
|
Landed for 2.5.0. Should also land to b2_4 for 2.4.2 |
| Comment by Kurt J. Strosahl (Inactive) [ 03/Mar/16 ] |
|
I'm experiencing the same bug in 2.5.3 when trying to attach an ost while the mgs is on its failover node. Mar 3 08:50:45 scoss1501 kernel: Lustre: Lustre: Build Version: 2.5.3-RC1--PRISTINE-2.6.32-431.23.3.el6_lustre.x86_64 |