[LU-17379] try MGS NIDs more quickly at initial mount - Whamcloud Community JIRA

Details

Type: Technical task
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.16.0
Affects Version/s: Lustre 2.14.0, Lustre 2.15.3
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

The MGC should try all of the MGS NIDs provided on the command line fairly rapidly the first time after mount (without RPC retry) so that the client can detect the case of the MGS running on a backup node quickly. Otherwise, if the mount command-line has many NIDs (4 is very typical, but may be 8 or even 16 in some cases) then the mount can be stuck for several minutes trying to find the MGS running on the right node.

The MGC should preferably one NID per node first, then the second NID on each node, etc. This can be done efficiently, assuming the NIDs are provided properly with colon-separated <MGSNODE>:<MGSNODE> blocks, and comma-separated "<MGSNID>,<MGSNID>" entries within that.

However, the NIDs are often not listed correctly with ":" separators, so if there are more than 4 MGS NIDs on the command-line, then it would be better to to do a "bisection" of the NIDs to best handle the case of 2/4/8 interfaces per node vs. 2/4/8 separate nodes. For example, try NIDs in order 0, nids/2, nids/4, nids*3/4 for 4 NIDs, then nids/8, nids*5/8, nids*3/8, nids*7/8 for 8 NIDs, then nids/16, nids*9/16, nids*5/16, nids*13/16, nids*3/16, nids*11/16, nids*7/16, nids*15/16 for 16 NIDs, and similarly for 32 NIDs (the maximum).

It should be fairly quick to determine if the MGS is not responding on a particular NID, because the client will get a rapid error response (e.g. -ENODEV or -ENOTCONN or -EHOSTUNREACH with a short RPC timeout) so in that case it should try all of the NIDs once quickly. If it gets ETIMEDOUT that might mean the node is unavailable or overloaded, or it might mean the MGS is not running yet, so the client should back off and retry the NIDs with a longer timeout after the initial burst.

However, in most cases the MGS should be running on some node and it just needs to avoid going into slow "backoff" mode until after it has tried all of the NIDs at least once.

It would make sense in this case to quiet the "connecting to MGS not running on this node" message for MGS connections so that it doesn't spam the console.

Attachments

Issue Links

is related to

LU-17906 conf-sanity test_153a: check mount failed

Resolved

LU-17476 lnet: only report mismatched nid in ME if bits match

Resolved

LU-17505 socklnd: return LNET_MSG_STATUS_NETWORK_TIMEOUT to LNet on ETIMEDOUT

Resolved

LU-17704 network error in sanity-benchmark

Resolved

LU-16738 Improve mount.lustre with many MGS NIDs

In Progress

is related to

LU-17357 Client can use incorrect sec flavor when MGT is relocated

Resolved

LU-18004 import_select_connection() is using CONNECTION_MAX instead of CONNECTION_MIN

Resolved

(2 is related to )

Activity

People

Assignee:: Mikhail Pershin

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 19/Dec/23 10:57 PM

Updated:: 15/May/25 10:09 PM

Resolved:: 29/Apr/24 1:03 PM