[LU-10359] remove NIDs from config llogs Created: 08/Dec/17 Updated: 08/Jan/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||
| Description |
|
It would be useful if the MGS did not store NIDs in the configuration records at all, rather than the current use of static IP addresses, so that the servers are able to have relatively dynamic IP addresses (assigned at boot time, not necessarily changing at runtime). The clients can already locate the MGS by a hostname, since mount.lustre will do DNS (or /etc/hosts) name resolution at mount time before initiating the MGC->MGS connection. However, the config llog records currently only store static IP addresses (NIDs, actually) because the config log is processed in the kernel, which did not have any DNS name resolution capabilities at the time of implementation. It should be noted that the config records mostly store the client NID in ASCII format (e.g. 192.168.20.1@tcp), though there is also a binary lnet_nid_t in at least one case. It may be relatively straight-forward to store an ASCII hostname@net record in place of the numeric NID in the config records, and then do hostname->IP resolution in the kernel during config log processing before passing the NID to the LNet layer. With LU-10360 allowing the use of IR logs from the MGS to advertise server NIDs to clients, it seems possible to remove the use of NIDs in the client config llog entirely and simplify configuration. |
| Comments |
| Comment by Andreas Dilger [ 08/Dec/17 ] |
|
It is possible that only one of LU-10359 and LU-10360 will be needed, but I haven't looked into how much effort each one is to implement. If servers are using DHCP and the IP address may change while the server is running, instead of only at mount time, there may be more work needed to handle the runtime address changes. IR would inform the client that the target NID has changed, but the client may need some work in the ptlrpc/LNet layer to consider that NID as valid for the specific target. In theory, the client/server would just consider the NID change and resulting loss of network connection to be the same as any other network error and reconnect to the new NID to perform recovery at the PtlRPC layer. |
| Comment by Jinshan Xiong (Inactive) [ 08/Dec/17 ] |
|
As the first step, we can still write static IP address for the MGS, also include failover servers. Otherwise it would be difficult for the other servers to figure out the failover IP address of the MGS during recovery. However, for the other servers, they can be configured with hostnames. At the startup time, those 'regular' servers will report their IP address to the MGS, and the MGS will announce those new IP addresses to clients. |
| Comment by Andreas Dilger [ 05/Mar/20 ] |
|
There are two different configuration records that need to be changed to allow this functionality:
Rather than updating the MGS config llogs to store hostnames (which would need to be resolved by the kernel when the client mounts), it would be better to implement LU-10360 to have the clients use the MGS Target Status Table that is dynamically generated when the OSTs and MDTs are mounted rather than store any hostnames in the config log at all. That avoids the need to update the config logs when the hostnames change, or new failover configurations are added, etc. For the mountdata file, the ldd_params field stores ASCII strings anyway, so it seems straight forward enough to store the MGS hostname there and resolve this at mount time rather than resolving the MGS hostname by mkfs.lustre and storing the IP address in ldd_params. |