Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14555

lnet_check_route_inconsistency() complains when hops == -1

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.14.0
    • RHEL 8
      multi-hop network
      hops not set
    • 3
    • 9223372036854775807

    Description

      We have the following configuration:

      2.14_servers == o2ib100 == 2.12_routers == tcp129 == 2.12_routers == o2ib18 == 2.12_clients

      Discovery is disabled, and the routes are configured statically, on all the systems.

      This causes LNet to complain vociferously on the console from lnet_check_route_inconsistency()

      LNet: 29144:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18->172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1

      If LNet is configured so that there is only one route to any given endpoint, even on a multi-hop network, there is no value to spending sysadmin time determining and setting the hop counts as far as I can tell.  And setting hops is optional according to the Lustre Operations Manual.

      Is hop count actually required in 2.14 due to LU-13029 and LU-13785?

      Attachments

        Activity

          [LU-14555] lnet_check_route_inconsistency() complains when hops == -1
          pjones Peter Jones made changes -
          Resolution New: Fixed [ 1 ]
          Status Original: Reopened [ 4 ] New: Resolved [ 5 ]
          defazio Gian-Carlo Defazio made changes -
          Resolution Original: Fixed [ 1 ]
          Status Original: Resolved [ 5 ] New: Reopened [ 4 ]
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.16.0 [ 15190 ]
          Assignee Original: Serguei Smirnov [ ssmirnov ] New: Gian-Carlo Defazio [ defazio ]
          Resolution New: Fixed [ 1 ]
          Status Original: Open [ 1 ] New: Resolved [ 5 ]
          ofaaland Olaf Faaland made changes -
          Description Original: We have the following configuration:

          2.14_servers == o2ib100 == 2.12_routers == tcp129 == 2.12_routers == o2ib18 == 2.12_clients

          Discovery is disabled, and the routes are configured statically, on all the systems.

          This causes LNet to complain vociferously on the console from lnet_check_route_inconsistency()
          {noformat}
          LNet: 29144:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18->172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1{noformat}
          If LNet is configured so that there is only one route to any given endpoint, even on a multi-hop network, there is no value to spending sysadmin time determining and setting the hop counts as far as I can tell.  And setting hops is optional according to the Lustre Operations Manual.

          Is hop count actually required in 2.14 due to LU-13029?
          New: We have the following configuration:

          2.14_servers == o2ib100 == 2.12_routers == tcp129 == 2.12_routers == o2ib18 == 2.12_clients

          Discovery is disabled, and the routes are configured statically, on all the systems.

          This causes LNet to complain vociferously on the console from lnet_check_route_inconsistency()
          {noformat}
          LNet: 29144:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18->172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1{noformat}
          If LNet is configured so that there is only one route to any given endpoint, even on a multi-hop network, there is no value to spending sysadmin time determining and setting the hop counts as far as I can tell.  And setting hops is optional according to the Lustre Operations Manual.

          Is hop count actually required in 2.14 due to LU-13029 and LU-13785?
          pjones Peter Jones made changes -
          Assignee Original: Olaf Faaland [ ofaaland ] New: Serguei Smirnov [ ssmirnov ]
          ofaaland Olaf Faaland made changes -
          Description Original: When routes are created but the hop count is not set, lr_hops defaults to -1 to indicate that it was not set by the admin. This causes LNet to complain on the console in lnet_check_route_inconsistency()
          {noformat}
          CWARN("route %s->%s is detected to be multi-hop but hop count is set to %d\n", {noformat}
          If LNet is configured so that there is only one route to any given endpoint, even on a multi-hop network, there is no value to spending sysadmin time determining and setting the hop counts as far as I can tell.  And setting hops is optional according to the Lustre Operations Manual.

          In addition, the message doesn't match the test.  The message says the route is multi-hop, but it's not checking to see if lr_hops == 1, it's checking for <= 1 which includes -1.
          New: We have the following configuration:

          2.14_servers == o2ib100 == 2.12_routers == tcp129 == 2.12_routers == o2ib18 == 2.12_clients

          Discovery is disabled, and the routes are configured statically, on all the systems.

          This causes LNet to complain vociferously on the console from lnet_check_route_inconsistency()
          {noformat}
          LNet: 29144:0:(router.c:384:lnet_check_route_inconsistency()) route o2ib18->172.19.1.54@o2ib100 is detected to be multi-hop but hop count is set to -1{noformat}
          If LNet is configured so that there is only one route to any given endpoint, even on a multi-hop network, there is no value to spending sysadmin time determining and setting the hop counts as far as I can tell.  And setting hops is optional according to the Lustre Operations Manual.

          Is hop count actually required in 2.14 due to LU-13029?
          ofaaland Olaf Faaland made changes -
          Labels Original: llnl llnlfixready New: llnl
          pjones Peter Jones made changes -
          Assignee Original: WC Triage [ wc-triage ] New: Olaf Faaland [ ofaaland ]
          ofaaland Olaf Faaland made changes -
          Description Original: When routes are created but the hop count is not set, lr_hops defaults to -1 to indicate that it was not set by the admin. This causes LNet to complain on the console in lnet_check_route_inconsistency()
          {noformat}
          CWARN("route %s->%s is detected to be multi-hop but hop count is set to %d\n", {noformat}
          If LNet is configured so that there is only one route to any given endpoint, even on a multi-hop network, there is no value to spending sysadmin time determining and setting the hop counts as far as I can tell.  And setting hops is optional according to the Lustre Operations Manual.

          In addition, the message doesn't match the test.  The message says the route is multi-hop, but it's checking not to see if lr_hops == 1, it's checking for <= 1 which includes -1.
          New: When routes are created but the hop count is not set, lr_hops defaults to -1 to indicate that it was not set by the admin. This causes LNet to complain on the console in lnet_check_route_inconsistency()
          {noformat}
          CWARN("route %s->%s is detected to be multi-hop but hop count is set to %d\n", {noformat}
          If LNet is configured so that there is only one route to any given endpoint, even on a multi-hop network, there is no value to spending sysadmin time determining and setting the hop counts as far as I can tell.  And setting hops is optional according to the Lustre Operations Manual.

          In addition, the message doesn't match the test.  The message says the route is multi-hop, but it's not checking to see if lr_hops == 1, it's checking for <= 1 which includes -1.
          ofaaland Olaf Faaland made changes -
          Labels Original: llnl New: llnl llnlfixready
          ofaaland Olaf Faaland created issue -

          People

            defazio Gian-Carlo Defazio
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: