Details
-
Improvement
-
Resolution: Fixed
-
Minor
-
None
-
None
-
9223372036854775807
Description
The kernel_bind() call in lnet_sock_create() may fail either due to
problem with the local port, or the local IP address, but the error message currently only includes the port. It would be helpful if the message included both items when indicating a fatal error.
Background: We've encoutered an issue where LNET had picked a virtual IP address (used for non-Lustre services) for its local_ip, and lnet_sock_create would fail once the IP address was migrated to another node. The error message only included the port, but not the IP address, and so it took a while to correlate the events. Why LNET chose to pick this particular source address is a separate question we need to investigate, but for starters, improving the error message to include all relevant content seems to be a good idea to me.