Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15512

Infinite loop in lnet_discover_peer_locked()

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.15.0
    • Lustre 2.15.0
    • None
    • 3
    • 9223372036854775807

    Description

      The fix from LU-13895 was incomplete. There is a case where lnet_discover_peer_locked() can enter an infinite loop. We need to check if the peer NI undergoing discovery has been deleted.

      Attachments

        Activity

          [LU-15512] Infinite loop in lnet_discover_peer_locked()
          pjones Peter Jones added a comment -

          Landed for 2.15

          pjones Peter Jones added a comment - Landed for 2.15

          "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46429/
          Subject: LU-15512 lnet: Stop discovery on deleted peer NI
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 94f4e1f517d71ffd6662fb4a82e3dee9aa8f6796

          gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/46429/ Subject: LU-15512 lnet: Stop discovery on deleted peer NI Project: fs/lustre-release Branch: master Current Patch Set: Commit: 94f4e1f517d71ffd6662fb4a82e3dee9aa8f6796
          spitzcor Cory Spitz added a comment -

          Raising as a Blocker for 2.15.0.

          spitzcor Cory Spitz added a comment - Raising as a Blocker for 2.15.0.
          hornc Chris Horn added a comment -

          Test notes for LU-15512

          Execute test case:

          [hornc@ct7-adm lustre-filesystem]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/29/46429/1 && git checkout FETCH_HEAD
          From https://review.whamcloud.com/fs/lustre-release
           * branch                  refs/changes/29/46429/1 -> FETCH_HEAD
          Note: switching to 'FETCH_HEAD'.
          
          You are in 'detached HEAD' state. You can look around, make experimental
          changes and commit them, and you can discard any commits you make in this
          state without impacting any branches by switching back to a branch.
          
          If you want to create a new branch to retain commits you create, you may
          do so (now or later) by using -c with the switch command. Example:
          
            git switch -c <new-branch-name>
          
          Or undo this operation with:
          
            git switch -
          
          Turn off this advice by setting config variable advice.detachedHead to false
          
          HEAD is now at b2d8c8c10f LU-15512 lnet: Stop discovery on deleted peer NI
          [hornc@ct7-adm lustre-filesystem]$ git reset --soft HEAD^
          [hornc@ct7-adm lustre-filesystem]$ git status
          HEAD detached from FETCH_HEAD
          Changes to be committed:
            (use "git restore --staged <file>..." to unstage)
          	modified:   lnet/lnet/peer.c
          	modified:   lustre/tests/sanity-lnet.sh
          
          Untracked files:
            (use "git add <file>..." to include in what will be committed)
          	lustre/tests/lutf/Makefile.in
          	lustre/tests/lutf/src/Makefile.in
          
          [hornc@ct7-adm lustre-filesystem]$ git reset HEAD lnet/lnet/peer.c
          Unstaged changes after reset:
          M	lnet/lnet/peer.c
          [hornc@ct7-adm lustre-filesystem]$ git checkout lnet/lnet/peer.c
          Updated 1 path from the index
          [hornc@ct7-adm lustre-filesystem]$ make -j 32
          ...
          [root@ct7-mds1 tests]# ./auster -N -v sanity-lnet --only 219
          Started at Wed Feb  2 18:57:04 UTC 2022
          ct7-mds1: executing check_logdir /tmp/test_logs/2022-02-02/185704
          Logging to shared log directory: /tmp/test_logs/2022-02-02/185704
          ct7-mds1: executing yml_node
          IOC_LIBCFS_GET_NI error 22: Invalid argument
          Client: 2.14.56.37
          MDS: 2.14.56.37
          OSS: 2.14.56.37
          running: sanity-lnet ONLY=219
          run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
          -----============= acceptance-small: sanity-lnet ============----- Wed Feb  2 18:57:06 UTC 2022
          Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
          excepting tests:
          opening /dev/obd failed: No such file or directory
          hint: the kernel modules may not be loaded
          Stopping clients: ct7-mds1 /mnt/lustre (opts:-f)
          Stopping clients: ct7-mds1 /mnt/lustre2 (opts:-f)
          modules unloaded.
          ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg
          ip netns exec test_ns ip link set test1pg up
          Loading modules from /home/hornc/lustre-filesystem/lustre
          detected 1 online CPUs by sysfs
          libcfs will create CPU partition based on online CPUs
          ../lnet/lnet/lnet options: 'networks=tcp(eth2) accept=all'
          ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1'
          quota/lquota options: 'hash_lqs_cur_bits=3'
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net show
          net:
              - net type: lo
                local NI(s):
                  - nid: 0@lo
                    status: up
              - net type: tcp
                local NI(s):
                  - nid: 10.73.20.11@tcp
                    status: up
                    interfaces:
                        0: eth2
          1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
              link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
              inet 127.0.0.1/8 scope host lo
                 valid_lft forever preferred_lft forever
              inet6 ::1/128 scope host
                 valid_lft forever preferred_lft forever
          2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
              link/ether 52:54:00:4d:77:d3 brd ff:ff:ff:ff:ff:ff
              inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
                 valid_lft 80909sec preferred_lft 80909sec
              inet6 fe80::5054:ff:fe4d:77d3/64 scope link
                 valid_lft forever preferred_lft forever
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
              link/ether 08:00:27:3d:67:a8 brd ff:ff:ff:ff:ff:ff
              inet 10.73.10.11/24 brd 10.73.10.255 scope global noprefixroute eth1
                 valid_lft forever preferred_lft forever
              inet6 fe80::a00:27ff:fe3d:67a8/64 scope link
                 valid_lft forever preferred_lft forever
          4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
              link/ether 08:00:27:00:c6:d1 brd ff:ff:ff:ff:ff:ff
              inet 10.73.20.11/24 brd 10.73.20.255 scope global noprefixroute eth2
                 valid_lft forever preferred_lft forever
              inet6 fe80::a00:27ff:fe00:c6d1/64 scope link
                 valid_lft forever preferred_lft forever
          5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
              link/ether 08:00:27:69:cc:a5 brd ff:ff:ff:ff:ff:ff
              inet 10.73.230.11/24 brd 10.73.230.255 scope global noprefixroute eth3
                 valid_lft forever preferred_lft forever
              inet6 fe80::a00:27ff:fe69:cca5/64 scope link
                 valid_lft forever preferred_lft forever
          8: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
              link/ether f6:00:25:a7:3e:ba brd ff:ff:ff:ff:ff:ff link-netnsid 0
              inet6 fe80::f400:25ff:fea7:3eba/64 scope link
                 valid_lft forever preferred_lft forever
          Cleaning up LNet
          modules unloaded.
          
          == sanity-lnet test 219: Consolidate peer entries ======== 18:57:11 (1643828231)
          Loading LNet and configuring DLC
          ../lnet/lnet/lnet options: 'networks=tcp(eth2) accept=all'
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl lnet configure
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net add --net tcp --if eth2
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth2
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl ping 10.73.20.11@tcp
          ping:
              - primary nid: 10.73.20.11@tcp
                Multi-Rail: False
                peer ni:
                  - nid: 10.73.20.11@tcp
                  - nid: 10.73.20.11@tcp1
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl ping 10.73.20.11@tcp1
          ping:
              - primary nid: 10.73.20.11@tcp1
                Multi-Rail: False
                peer ni:
                  - nid: 10.73.20.11@tcp
                  - nid: 10.73.20.11@tcp1
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl discover 10.73.20.11@tcp1
          Connection to 127.0.0.1 closed by remote host.
          Connection to 127.0.0.1 closed.
          

          The test hangs indefinitely until node is stopped.

          Apply fix and re-test:

          [hornc@ct7-adm lustre-filesystem]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/29/46429/1 && git checkout FETCH_HEAD
          From https://review.whamcloud.com/fs/lustre-release
           * branch                  refs/changes/29/46429/1 -> FETCH_HEAD
          Previous HEAD position was b79f82c23c LU-15446 lnet: Don't use pref NI for reserved portal
          HEAD is now at b2d8c8c10f LU-15512 lnet: Stop discovery on deleted peer NI
          [hornc@ct7-adm lustre-filesystem]$ make -j 32
          ...
          [root@ct7-mds1 tests]# ./auster -N -v sanity-lnet --only 219
          Started at Wed Feb  2 19:07:03 UTC 2022
          ct7-mds1: executing check_logdir /tmp/test_logs/2022-02-02/190702
          Logging to shared log directory: /tmp/test_logs/2022-02-02/190702
          ct7-mds1: executing yml_node
          IOC_LIBCFS_GET_NI error 22: Invalid argument
          Client: 2.14.56.37
          MDS: 2.14.56.37
          OSS: 2.14.56.37
          running: sanity-lnet ONLY=219
          run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
          -----============= acceptance-small: sanity-lnet ============----- Wed Feb  2 19:07:05 UTC 2022
          Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh
          excepting tests:
          opening /dev/obd failed: No such file or directory
          hint: the kernel modules may not be loaded
          Stopping clients: ct7-mds1 /mnt/lustre (opts:-f)
          Stopping clients: ct7-mds1 /mnt/lustre2 (opts:-f)
          modules unloaded.
          ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg
          ip netns exec test_ns ip link set test1pg up
          Loading modules from /home/hornc/lustre-filesystem/lustre
          detected 1 online CPUs by sysfs
          libcfs will create CPU partition based on online CPUs
          ../lnet/lnet/lnet options: 'networks=tcp(eth2) accept=all'
          ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1'
          quota/lquota options: 'hash_lqs_cur_bits=3'
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net show
          net:
              - net type: lo
                local NI(s):
                  - nid: 0@lo
                    status: up
              - net type: tcp
                local NI(s):
                  - nid: 10.73.20.11@tcp
                    status: up
                    interfaces:
                        0: eth2
          1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
              link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
              inet 127.0.0.1/8 scope host lo
                 valid_lft forever preferred_lft forever
              inet6 ::1/128 scope host
                 valid_lft forever preferred_lft forever
          2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
              link/ether 52:54:00:4d:77:d3 brd ff:ff:ff:ff:ff:ff
              inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0
                 valid_lft 85931sec preferred_lft 85931sec
              inet6 fe80::5054:ff:fe4d:77d3/64 scope link
                 valid_lft forever preferred_lft forever
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
              link/ether 08:00:27:3d:67:a8 brd ff:ff:ff:ff:ff:ff
              inet 10.73.10.11/24 brd 10.73.10.255 scope global noprefixroute eth1
                 valid_lft forever preferred_lft forever
              inet6 fe80::a00:27ff:fe3d:67a8/64 scope link
                 valid_lft forever preferred_lft forever
          4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
              link/ether 08:00:27:00:c6:d1 brd ff:ff:ff:ff:ff:ff
              inet 10.73.20.11/24 brd 10.73.20.255 scope global noprefixroute eth2
                 valid_lft forever preferred_lft forever
              inet6 fe80::a00:27ff:fe00:c6d1/64 scope link
                 valid_lft forever preferred_lft forever
          5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
              link/ether 08:00:27:69:cc:a5 brd ff:ff:ff:ff:ff:ff
              inet 10.73.230.11/24 brd 10.73.230.255 scope global noprefixroute eth3
                 valid_lft forever preferred_lft forever
              inet6 fe80::a00:27ff:fe69:cca5/64 scope link
                 valid_lft forever preferred_lft forever
          6: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
              link/ether 8e:64:da:f2:d4:fd brd ff:ff:ff:ff:ff:ff link-netnsid 0
              inet6 fe80::8c64:daff:fef2:d4fd/64 scope link
                 valid_lft forever preferred_lft forever
          Cleaning up LNet
          modules unloaded.
          
          == sanity-lnet test 219: Consolidate peer entries ======== 19:07:10 (1643828830)
          Loading LNet and configuring DLC
          ../lnet/lnet/lnet options: 'networks=tcp(eth2) accept=all'
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl lnet configure
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net add --net tcp --if eth2
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth2
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl ping 10.73.20.11@tcp
          ping:
              - primary nid: 10.73.20.11@tcp
                Multi-Rail: False
                peer ni:
                  - nid: 10.73.20.11@tcp
                  - nid: 10.73.20.11@tcp1
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl ping 10.73.20.11@tcp1
          ping:
              - primary nid: 10.73.20.11@tcp1
                Multi-Rail: False
                peer ni:
                  - nid: 10.73.20.11@tcp
                  - nid: 10.73.20.11@tcp1
          /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl discover 10.73.20.11@tcp1
          discover:
              - primary nid: 10.73.20.11@tcp
                Multi-Rail: True
                peer ni:
                  - nid: 10.73.20.11@tcp
                  - nid: 10.73.20.11@tcp1
          PASS 219 (2s)
          == sanity-lnet test complete, duration 7 sec ============= 19:07:12 (1643828832)
          Cleaning up LNet
          opening /dev/obd failed: No such file or directory
          hint: the kernel modules may not be loaded
          modules unloaded.
          sanity-lnet returned 0
          Finished at Wed Feb  2 19:07:14 UTC 2022 in 12s
          ./auster: completed with rc 0
          [root@ct7-mds1 tests]#
          
          hornc Chris Horn added a comment - Test notes for LU-15512 Execute test case: [hornc@ct7-adm lustre-filesystem]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/29/46429/1 && git checkout FETCH_HEAD From https://review.whamcloud.com/fs/lustre-release * branch refs/changes/29/46429/1 -> FETCH_HEAD Note: switching to 'FETCH_HEAD'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c <new-branch-name> Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false HEAD is now at b2d8c8c10f LU-15512 lnet: Stop discovery on deleted peer NI [hornc@ct7-adm lustre-filesystem]$ git reset --soft HEAD^ [hornc@ct7-adm lustre-filesystem]$ git status HEAD detached from FETCH_HEAD Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: lnet/lnet/peer.c modified: lustre/tests/sanity-lnet.sh Untracked files: (use "git add <file>..." to include in what will be committed) lustre/tests/lutf/Makefile.in lustre/tests/lutf/src/Makefile.in [hornc@ct7-adm lustre-filesystem]$ git reset HEAD lnet/lnet/peer.c Unstaged changes after reset: M lnet/lnet/peer.c [hornc@ct7-adm lustre-filesystem]$ git checkout lnet/lnet/peer.c Updated 1 path from the index [hornc@ct7-adm lustre-filesystem]$ make -j 32 ... [root@ct7-mds1 tests]# ./auster -N -v sanity-lnet --only 219 Started at Wed Feb 2 18:57:04 UTC 2022 ct7-mds1: executing check_logdir /tmp/test_logs/2022-02-02/185704 Logging to shared log directory: /tmp/test_logs/2022-02-02/185704 ct7-mds1: executing yml_node IOC_LIBCFS_GET_NI error 22: Invalid argument Client: 2.14.56.37 MDS: 2.14.56.37 OSS: 2.14.56.37 running: sanity-lnet ONLY=219 run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh -----============= acceptance-small: sanity-lnet ============----- Wed Feb 2 18:57:06 UTC 2022 Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh excepting tests: opening /dev/obd failed: No such file or directory hint: the kernel modules may not be loaded Stopping clients: ct7-mds1 /mnt/lustre (opts:-f) Stopping clients: ct7-mds1 /mnt/lustre2 (opts:-f) modules unloaded. ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg ip netns exec test_ns ip link set test1pg up Loading modules from /home/hornc/lustre-filesystem/lustre detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs ../lnet/lnet/lnet options: 'networks=tcp(eth2) accept=all' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.73.20.11@tcp status: up interfaces: 0: eth2 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:4d:77:d3 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0 valid_lft 80909sec preferred_lft 80909sec inet6 fe80::5054:ff:fe4d:77d3/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:3d:67:a8 brd ff:ff:ff:ff:ff:ff inet 10.73.10.11/24 brd 10.73.10.255 scope global noprefixroute eth1 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe3d:67a8/64 scope link valid_lft forever preferred_lft forever 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:00:c6:d1 brd ff:ff:ff:ff:ff:ff inet 10.73.20.11/24 brd 10.73.20.255 scope global noprefixroute eth2 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe00:c6d1/64 scope link valid_lft forever preferred_lft forever 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:69:cc:a5 brd ff:ff:ff:ff:ff:ff inet 10.73.230.11/24 brd 10.73.230.255 scope global noprefixroute eth3 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe69:cca5/64 scope link valid_lft forever preferred_lft forever 8: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether f6:00:25:a7:3e:ba brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::f400:25ff:fea7:3eba/64 scope link valid_lft forever preferred_lft forever Cleaning up LNet modules unloaded. == sanity-lnet test 219: Consolidate peer entries ======== 18:57:11 (1643828231) Loading LNet and configuring DLC ../lnet/lnet/lnet options: 'networks=tcp(eth2) accept=all' /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl lnet configure /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net add --net tcp --if eth2 /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth2 /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl ping 10.73.20.11@tcp ping: - primary nid: 10.73.20.11@tcp Multi-Rail: False peer ni: - nid: 10.73.20.11@tcp - nid: 10.73.20.11@tcp1 /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl ping 10.73.20.11@tcp1 ping: - primary nid: 10.73.20.11@tcp1 Multi-Rail: False peer ni: - nid: 10.73.20.11@tcp - nid: 10.73.20.11@tcp1 /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl discover 10.73.20.11@tcp1 Connection to 127.0.0.1 closed by remote host. Connection to 127.0.0.1 closed. The test hangs indefinitely until node is stopped. Apply fix and re-test: [hornc@ct7-adm lustre-filesystem]$ git fetch https://review.whamcloud.com/fs/lustre-release refs/changes/29/46429/1 && git checkout FETCH_HEAD From https://review.whamcloud.com/fs/lustre-release * branch refs/changes/29/46429/1 -> FETCH_HEAD Previous HEAD position was b79f82c23c LU-15446 lnet: Don't use pref NI for reserved portal HEAD is now at b2d8c8c10f LU-15512 lnet: Stop discovery on deleted peer NI [hornc@ct7-adm lustre-filesystem]$ make -j 32 ... [root@ct7-mds1 tests]# ./auster -N -v sanity-lnet --only 219 Started at Wed Feb 2 19:07:03 UTC 2022 ct7-mds1: executing check_logdir /tmp/test_logs/2022-02-02/190702 Logging to shared log directory: /tmp/test_logs/2022-02-02/190702 ct7-mds1: executing yml_node IOC_LIBCFS_GET_NI error 22: Invalid argument Client: 2.14.56.37 MDS: 2.14.56.37 OSS: 2.14.56.37 running: sanity-lnet ONLY=219 run_suite sanity-lnet /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh -----============= acceptance-small: sanity-lnet ============----- Wed Feb 2 19:07:05 UTC 2022 Running: bash /home/hornc/lustre-filesystem/lustre/tests/sanity-lnet.sh excepting tests: opening /dev/obd failed: No such file or directory hint: the kernel modules may not be loaded Stopping clients: ct7-mds1 /mnt/lustre (opts:-f) Stopping clients: ct7-mds1 /mnt/lustre2 (opts:-f) modules unloaded. ip netns exec test_ns ip addr add 10.1.2.3/31 dev test1pg ip netns exec test_ns ip link set test1pg up Loading modules from /home/hornc/lustre-filesystem/lustre detected 1 online CPUs by sysfs libcfs will create CPU partition based on online CPUs ../lnet/lnet/lnet options: 'networks=tcp(eth2) accept=all' ptlrpc/ptlrpc options: 'lbug_on_grant_miscount=1' quota/lquota options: 'hash_lqs_cur_bits=3' /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: tcp local NI(s): - nid: 10.73.20.11@tcp status: up interfaces: 0: eth2 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 52:54:00:4d:77:d3 brd ff:ff:ff:ff:ff:ff inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic eth0 valid_lft 85931sec preferred_lft 85931sec inet6 fe80::5054:ff:fe4d:77d3/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:3d:67:a8 brd ff:ff:ff:ff:ff:ff inet 10.73.10.11/24 brd 10.73.10.255 scope global noprefixroute eth1 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe3d:67a8/64 scope link valid_lft forever preferred_lft forever 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:00:c6:d1 brd ff:ff:ff:ff:ff:ff inet 10.73.20.11/24 brd 10.73.20.255 scope global noprefixroute eth2 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe00:c6d1/64 scope link valid_lft forever preferred_lft forever 5: eth3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 link/ether 08:00:27:69:cc:a5 brd ff:ff:ff:ff:ff:ff inet 10.73.230.11/24 brd 10.73.230.255 scope global noprefixroute eth3 valid_lft forever preferred_lft forever inet6 fe80::a00:27ff:fe69:cca5/64 scope link valid_lft forever preferred_lft forever 6: test1pl@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 8e:64:da:f2:d4:fd brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet6 fe80::8c64:daff:fef2:d4fd/64 scope link valid_lft forever preferred_lft forever Cleaning up LNet modules unloaded. == sanity-lnet test 219: Consolidate peer entries ======== 19:07:10 (1643828830) Loading LNet and configuring DLC ../lnet/lnet/lnet options: 'networks=tcp(eth2) accept=all' /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl lnet configure /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net add --net tcp --if eth2 /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl net add --net tcp1 --if eth2 /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl ping 10.73.20.11@tcp ping: - primary nid: 10.73.20.11@tcp Multi-Rail: False peer ni: - nid: 10.73.20.11@tcp - nid: 10.73.20.11@tcp1 /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl ping 10.73.20.11@tcp1 ping: - primary nid: 10.73.20.11@tcp1 Multi-Rail: False peer ni: - nid: 10.73.20.11@tcp - nid: 10.73.20.11@tcp1 /home/hornc/lustre-filesystem/lustre/../lnet/utils/lnetctl discover 10.73.20.11@tcp1 discover: - primary nid: 10.73.20.11@tcp Multi-Rail: True peer ni: - nid: 10.73.20.11@tcp - nid: 10.73.20.11@tcp1 PASS 219 (2s) == sanity-lnet test complete, duration 7 sec ============= 19:07:12 (1643828832) Cleaning up LNet opening /dev/obd failed: No such file or directory hint: the kernel modules may not be loaded modules unloaded. sanity-lnet returned 0 Finished at Wed Feb 2 19:07:14 UTC 2022 in 12s ./auster: completed with rc 0 [root@ct7-mds1 tests]#

          "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46429
          Subject: LU-15512 lnet: Stop discovery on deleted peer NI
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: b2d8c8c10f560426864407d8bf0f1e84aa431aef

          gerrit Gerrit Updater added a comment - "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/46429 Subject: LU-15512 lnet: Stop discovery on deleted peer NI Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b2d8c8c10f560426864407d8bf0f1e84aa431aef

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: