Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
If discovery is disabled locally then the discovery thread will not
modify any peer objects as a result of the discovery process. Thus,
the primary NID of any peer we're asked to discover will not change
as a result of discovery. Therefore, we do not need to actually
perform discovery in LNetPrimaryNID() if discovery is disabled
locally. Since this routine can result in long client mount times
when a Lustre server is down we should avoid this unnecessary
discovery.
Attachments
Issue Links
- is related to
-
LU-14668 LNet: do discovery in the background
-
- Resolved
-
We encountered an issue with 2 LNet routes missing on the server side (OSS): the clients could communicate with server but the servers could not answer.
Clients tried periodically to connect to the servers maintaining the missing peers in the discovery list (the_lnet.ln_dc_working). This have the consequences to wait indefinitely for peer discovery in ll_ostXX_XXX threads and progressively contaminating all the available threads (the client keep sending connection requests).
The server became unavailable for all the clients.
The "LNet discovery" and the "LNet health" is disabled on the clients and on the servers.