Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16836

LNet: initial ni status is "up" if starting with link disconnected

Details

    • 3
    • 9223372036854775807

    Description

      Adding NI if the corresponding interface is disconnected results in NI listed as "up" while it is expected to be "down". For example:

      7: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
          link/ether 52:54:00:a3:1a:44 brd ff:ff:ff:ff:ff:ff
          inet 192.168.122.239/24 brd 192.168.122.255 scope global noprefixroute dynamic bond0
             valid_lft 3264sec preferred_lft 3264sec
          inet6 fe80::5054:ff:fea3:1a44/64 scope link 
             valid_lft forever preferred_lft forever
      # lnetctl net add --net tcp --if bond0
      # lnetctl net show
      net:
          - net type: lo
            local NI(s):
              - nid: 0@lo
                status: up
          - net type: tcp
            local NI(s):
              - nid: 192.168.122.239@tcp
                status: up
                interfaces:
                    0: bond0

       

      Attachments

        Issue Links

          Activity

            [LU-16836] LNet: initial ni status is "up" if starting with link disconnected
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51057/
            Subject: LU-16836 lnet: ensure dev notification on lnd startup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 09c6e2b872287c847d15620788f6cf50b3a9f30b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51057/ Subject: LU-16836 lnet: ensure dev notification on lnd startup Project: fs/lustre-release Branch: master Current Patch Set: Commit: 09c6e2b872287c847d15620788f6cf50b3a9f30b

            "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51057
            Subject: LU-16836 lnet: ensure dev notification on lnd startup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: fcb6d8de5eb5519d3eafd123719427c21ba0d6da

            gerrit Gerrit Updater added a comment - "Serguei Smirnov <ssmirnov@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51057 Subject: LU-16836 lnet: ensure dev notification on lnd startup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: fcb6d8de5eb5519d3eafd123719427c21ba0d6da

            The issue here is that LNet may be able to initialize the LND, but doesn't get the link status update so it is not notified that the link is down. The new NI is then stuck with the "up" status until an event occurs which refreshes it, e.g plugging/unplugging the cable, etc.

            Here are the cases when this can happen:

            • Cable is disconnected from the interface (socklnd, o2iblnd)
            • All slave links of a bonded interface are down (the example from the ticket description). The links may be "administered" to be down, so the bond interface will be considered down, but still retain the IP and thus enable the LND to be initialized.

            Except with the slave bonded links as shown above, "ifdown" and "ip link set dev <if> down" commands don't appear to be useful for simulating this issue.

            ssmirnov Serguei Smirnov added a comment - The issue here is that LNet may be able to initialize the LND, but doesn't get the link status update so it is not notified that the link is down. The new NI is then stuck with the "up" status until an event occurs which refreshes it, e.g plugging/unplugging the cable, etc. Here are the cases when this can happen: Cable is disconnected from the interface (socklnd, o2iblnd) All slave links of a bonded interface are down (the example from the ticket description). The links may be "administered" to be down, so the bond interface will be considered down, but still retain the IP and thus enable the LND to be initialized. Except with the slave bonded links as shown above, "ifdown" and "ip link set dev <if> down" commands don't appear to be useful for simulating this issue.

            People

              ssmirnov Serguei Smirnov
              ssmirnov Serguei Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: