[LU-14164] libcfs_str2nid() gives undesirable output Created: 01/Dec/20  Updated: 01/Dec/20

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Amir Shehata (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

On my system (for some reason) resolving an unknown hostname returns the loopback IP (127.0.0.1).

># getent hosts mgs.exports.192.168.122.100
::1 mgs.exports.192.168.122.100.localhost 

this ends up breaking specific commands, like

 lctl get_param -n *.MGS*.exports.192.168.122.100@tcp.uuid

this ends up trying to look at:

 mgs/mgs/exports/192/168/122/100@tcp999/uuid

The reason is that get hostbyname() called from clean_path()->libcfs_str2nid() returns 127.0.0.1 for "exports.192.168.122.100". This leads the to the NID not found in the string which leads to the IP address being included in the path.

To be clear this seems to be an issue on my setup. Filing this ticket to raise awareness that this could happen.

I ran into it while trying to run sanity-sec:test_31



 Comments   
Comment by Andreas Dilger [ 01/Dec/20 ]

I think it would be ok to return LNET_NID_ANY in case of 127.0.0.1, since we already do not allow the TCP loop back interface to be configured for LNet. That is typically an error due to "$hostname" being aliased to localhost, and using the LNet 0@lo interface is more efficient.

Comment by Andreas Dilger [ 01/Dec/20 ]

See, for example, the code in lustre/utils/mkfs_lustre.c that verifies the server NID:

                nid = libcfs_str2nid(s1);
                if (nid == LNET_NID_ANY) {
                        fprintf(stderr, "%s: Cannot resolve hostname '%s'.\n",
                                progname, s1);
                        free(converted);
                        return NULL;
                }
                if (strncmp(libcfs_nid2str(nid), "127.0.0.1",
                            strlen("127.0.0.1")) == 0) {
                        fprintf(stderr,
                                "%s: The NID '%s' resolves to the loopback address '%s'.  Lustre requires a non-loopback address.\n",
                                progname, s1, libcfs_nid2str(nid));

It may be that returning LNET_NID_ANY for 127.0.0.1 would prevent this message from being printed properly, which I think would be confusing to users, so it may be that clean_path() needs a similar check:

                        nid = libcfs_str2nid(tmp);
                        if (nid != LNET_NID_ANY) {
                                nidstr = libcfs_nid2str(nid);
                                if (!nidstr)
                                       return -EINVAL;
+                               /* some DNS resolvers return localhost for unknown hosts,
+                                * but this can never be valid NID for any mounted client.
+                                */
+                               if (strncmp(nidstr, "127.0.0.1", strlen("127.0.0.1")) == 0) {
+                                      nidstr = NULL;
+                                      continue;
+                               }
                                break;
                        }

Amir, can you see if this solves your problem?

Generated at Sat Feb 10 03:07:23 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.