Details
-
Technical task
-
Resolution: Fixed
-
Minor
-
Lustre 2.14.0, Lustre 2.15.3
-
None
-
3
-
9223372036854775807
Description
The MGC should try all of the MGS NIDs provided on the command line fairly rapidly the first time after mount (without RPC retry) so that the client can detect the case of the MGS running on a backup node quickly. Otherwise, if the mount command-line has many NIDs (4 is very typical, but may be 8 or even 16 in some cases) then the mount can be stuck for several minutes trying to find the MGS running on the right node.
The MGC should preferably one NID per node first, then the second NID on each node, etc. This can be done efficiently, assuming the NIDs are provided properly with colon-separated <MGSNODE>:<MGSNODE> blocks, and comma-separated "<MGSNID>,<MGSNID>" entries within that.
However, the NIDs are often not listed correctly with ":" separators, so if there are more than 4 MGS NIDs on the command-line, then it would be better to to do a "bisection" of the NIDs to best handle the case of 2/4/8 interfaces per node vs. 2/4/8 separate nodes. For example, try NIDs in order 0, nids/2, nids/4, nids*3/4 for 4 NIDs, then nids/8, nids*5/8, nids*3/8, nids*7/8 for 8 NIDs, then nids/16, nids*9/16, nids*5/16, nids*13/16, nids*3/16, nids*11/16, nids*7/16, nids*15/16 for 16 NIDs, and similarly for 32 NIDs (the maximum).
It should be fairly quick to determine if the MGS is not responding on a particular NID, because the client will get a rapid error response (e.g. -ENODEV or -ENOTCONN or -EHOSTUNREACH with a short RPC timeout) so in that case it should try all of the NIDs once quickly. If it gets ETIMEDOUT that might mean the node is unavailable or overloaded, or it might mean the MGS is not running yet, so the client should back off and retry the NIDs with a longer timeout after the initial burst.
However, in most cases the MGS should be running on some node and it just needs to avoid going into slow "backoff" mode until after it has tried all of the NIDs at least once.
It would make sense in this case to quiet the "connecting to MGS not running on this node" message for MGS connections so that it doesn't spam the console.
Attachments
Issue Links
- is related to
-
LU-17906 conf-sanity test_153a: check mount failed
- Reopened
-
LU-17476 lnet: only report mismatched nid in ME if bits match
- Resolved
-
LU-17505 socklnd: return LNET_MSG_STATUS_NETWORK_TIMEOUT to LNet on ETIMEDOUT
- Resolved
-
LU-17704 network error in sanity-benchmark
- Resolved
-
LU-16738 Improve mount.lustre with many MGS NIDs
- Open
- is related to
-
LU-17357 Client can use incorrect sec flavor when MGT is relocated
- Resolved
-
LU-18004 import_select_connection() is using CONNECTION_MAX instead of CONNECTION_MIN
- Resolved