Description
In conjunction with the new kfilnd, it would make sense to add in a generic mechanism for connecting to new/unknown LNet network types (via router) by directly specifying the LNet network type and number, something like "1234@lnet17n5", or similar.
That wouldn't help us today in the case of kfilnd, since it is just as easy to backport patch https://review.whamcloud.com/47830 "LU-15983 lnet: Define KFILND network type" to the older branches as it is to backport whatever patch is added for this ticket.
However, the goal of this ticket is to allow arbitrarily old clients to mount the filesystem without patching them by specifying a manual LNet number at mount time (to connect to the MGS and/or LNet routers), together with LU-10360 to have the MGS send the binary server NIDs (including LNet network number) directly to the client rather than the client having to parse the NID strings from the config logs themselves.
For the "generic" LNet network name, I propose a strawman @lnetNNNn, which is short enough to be useful, but long enough to be unique, and does not end with a digit itself, so that the network type can be parsed, and it is followed by the network number.
To allow transparent interoperability with network configurations (e.g. config logs from the MGS), it should also be possible to register an LND "name" for the generic lnetNNNn name, like "tofu" or "kfi" so that NIDs that contain these names can be properly parsed. This would require dynamically registering entries into the libcfs_netstrfns[] array so that the proper name->lnet_nid and lnet_nid->name translation could be done. It may be that the existing /etc/lustre/nid2hostname could be used for part of this, but it is not clear yet.
For the node address, I would propose just trying to parse both a single digit, as well as a dotted-quad, which should work with all networks today except IPv6. If the digit parsing was made long enough (e.g. parsing hex digits into __u32 values and ignoring non-hex characters), it could potentially parse any number (including IPv6 and IB GUIDs, or whatever). That may be a bit hackish, but possibly preferable to having to patch/upgrade very old clients, and at least has some chance to work.
This would likely only be needed for user tools, the MGS, and LNet router NIDs, if LU-10360 is available to supply all of the server NIDs in binary format.
Attachments
Issue Links
- is related to
-
LU-10593 Cannot mount client if SSK is setup over IB network
-
- Open
-
-
LU-10360 use Imperative Recovery logs for client->MDT/OST connections
-
- Open
-
-
LU-16035 Land kfilnd implementation
-
- Resolved
-
-
LU-18819 allow direct IB GUID addressing without IPoIB
-
- Open
-
-
LU-15983 Reserve KFI LND
-
- Resolved
-
-
LU-17706 Reserve TOFULND and EFALND network types
-
- Resolved
-