[LU-16722] MGS config log restructuring and redundancy Created: 07/Apr/23  Updated: 09/Dec/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Colin Faber Assignee: Andreas Dilger
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-10360 use Imperative Recovery logs for clie... Open
is related to LU-13308 update changelog_ext_nid to handle IP... Open
is related to LU-13306 allow clients to accept mgs_nidtbl_en... Resolved
is related to LU-16738 Improve mount.lustre with many MGS NIDs Open
Rank (Obsolete): 9223372036854775807

 Description   

Restructure the MGS config management system to better handle modern environments.

Allowing multiple redundant MGS/MGT devices in a filesystem (e.g. one replica running on each of 4 separate MDS nodes) would significantly improve reliability, since clients could still mount in the face of an MGS failure, and imperative recovery would continue to work.



 Comments   
Comment by Andreas Dilger [ 09/Dec/23 ]

As yet, no investigation has been done in this area. Open questions for discussion and resoultion include:

  • do clients only connect to a single MGS, but perform some kind of round-robin selection of the target NID to use at mount, to distribute load among the MGS instances?
  • how are config logs replicated between MGT instances?
    • Using FLR layouts for the files on the MGT would re-use existing infrastructure, and give a clear indication of which is the primary mirror and which mirrors might be "stale" and need to be resync'd from the primary. On the flip side, this might complicate the replication.
    • Using a llog consumer on the backup MGTs to read the config logs and store locally (as the MDT and OST llog copies are handled) would give more independence between the MGTs
  • are the "backup" MGTs read-only, or could one of them take over in case the primary MGT fails?  Or should there be a manual resync process from any of the backup MGTs if the primary is corrupted?  There are not very many files on the MGT, so doing a full reformat/resync is relatively simple.
     14  100644 (1)      0      0   12288 25-Nov-2023 00:50 mountdata
     82  100644 (1)      0      0    8192 31-Dec-1969 17:00 nodemap
     85  100644 (1)      0      0   11032 31-Dec-1969 17:00 params
     86  100644 (1)      0      0   22808 31-Dec-1969 17:00 testfs-client
     87  100644 (1)      0      0   28552 31-Dec-1969 17:00 testfs-MDT0000
    154  100644 (1)      0      0   28016 31-Dec-1969 17:00 testfs-MDT0001
    156  100644 (1)      0      0   27480 31-Dec-1969 17:00 testfs-MDT0002
    158  100644 (1)      0      0   26944 31-Dec-1969 17:00 testfs-MDT0003
    160  100644 (1)      0      0   17992 31-Dec-1969 17:00 testfs-OST0000
    166  100644 (1)      0      0   17680 31-Dec-1969 17:00 testfs-OST0001
    167  100644 (1)      0      0   17144 31-Dec-1969 17:00 testfs-OST0002
    168  100644 (1)      0      0   16608 31-Dec-1969 17:00 testfs-OST0003 
Generated at Sat Feb 10 03:29:27 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.