[LU-14164] libcfs_str2nid() gives undesirable output Created: 01/Dec/20 Updated: 01/Dec/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Amir Shehata (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
On my system (for some reason) resolving an unknown hostname returns the loopback IP (127.0.0.1). ># getent hosts mgs.exports.192.168.122.100 ::1 mgs.exports.192.168.122.100.localhost this ends up breaking specific commands, like lctl get_param -n *.MGS*.exports.192.168.122.100@tcp.uuid this ends up trying to look at: mgs/mgs/exports/192/168/122/100@tcp999/uuid The reason is that get hostbyname() called from clean_path()->libcfs_str2nid() returns 127.0.0.1 for "exports.192.168.122.100". This leads the to the NID not found in the string which leads to the IP address being included in the path. To be clear this seems to be an issue on my setup. Filing this ticket to raise awareness that this could happen. I ran into it while trying to run sanity-sec:test_31 |
| Comments |
| Comment by Andreas Dilger [ 01/Dec/20 ] |
|
I think it would be ok to return LNET_NID_ANY in case of 127.0.0.1, since we already do not allow the TCP loop back interface to be configured for LNet. That is typically an error due to "$hostname" being aliased to localhost, and using the LNet 0@lo interface is more efficient. |
| Comment by Andreas Dilger [ 01/Dec/20 ] |
|
See, for example, the code in lustre/utils/mkfs_lustre.c that verifies the server NID:
nid = libcfs_str2nid(s1);
if (nid == LNET_NID_ANY) {
fprintf(stderr, "%s: Cannot resolve hostname '%s'.\n",
progname, s1);
free(converted);
return NULL;
}
if (strncmp(libcfs_nid2str(nid), "127.0.0.1",
strlen("127.0.0.1")) == 0) {
fprintf(stderr,
"%s: The NID '%s' resolves to the loopback address '%s'. Lustre requires a non-loopback address.\n",
progname, s1, libcfs_nid2str(nid));
It may be that returning LNET_NID_ANY for 127.0.0.1 would prevent this message from being printed properly, which I think would be confusing to users, so it may be that clean_path() needs a similar check:
nid = libcfs_str2nid(tmp);
if (nid != LNET_NID_ANY) {
nidstr = libcfs_nid2str(nid);
if (!nidstr)
return -EINVAL;
+ /* some DNS resolvers return localhost for unknown hosts,
+ * but this can never be valid NID for any mounted client.
+ */
+ if (strncmp(nidstr, "127.0.0.1", strlen("127.0.0.1")) == 0) {
+ nidstr = NULL;
+ continue;
+ }
break;
}
Amir, can you see if this solves your problem? |