Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16283

o2iblnd.c:3049:kiblnd_shutdown() <NID>: waiting for <N> peers to disconnect

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.15.1
    • None
    • 4.18.0-372.32.1.1toss.t4.x86_64
      lustre-2.15.1_7.llnl-2.t4.x86_64
    • 3
    • 9223372036854775807

    Description

      Observed on a lustre router node, while the servers and some of the clients were up and connected. The luster router node has Omnipath on the client side and IB on the lustre server side.

      lnetctl lnet unconfigure 

      hangs with stack

      [<0>] kiblnd_shutdown+0x347/0x4e0 [ko2iblnd]
      [<0>] lnet_shutdown_lndni+0x2b6/0x4c0 [lnet]
      [<0>] lnet_shutdown_lndnet+0x6c/0xb0 [lnet]
      [<0>] lnet_shutdown_lndnets+0x11e/0x300 [lnet]
      [<0>] LNetNIFini+0xb7/0x130 [lnet]
      [<0>] lnet_ioctl+0x220/0x260 [lnet]
      [<0>] notifier_call_chain+0x47/0x70
      [<0>] blocking_notifier_call_chain+0x42/0x60
      [<0>] libcfs_psdev_ioctl+0x346/0x590 [libcfs]
      [<0>] do_vfs_ioctl+0xa5/0x740
      [<0>] ksys_ioctl+0x64/0xa0
      [<0>] __x64_sys_ioctl+0x16/0x20
      [<0>] do_syscall_64+0x5b/0x1b0
      [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 

      Debug log shows it's waiting for 3 peers, even after 3700 seconds:

      00000800:00000200:1.0:1667256015.359743:0:35023:0:(o2iblnd.c:3049:kiblnd_shutdown()) 172.19.1.108@o2ib100: waiting for 3 peers to disconnect 
      ...
      00000800:00000200:3.0:1667259799.039743:0:35023:0:(o2iblnd.c:3049:kiblnd_shutdown()) 172.19.1.108@o2ib100: waiting for 3 peers to disconnect

      Before the shutdown there were 38 peers, all reported as "up"

      For patch stack, see https://github.com/LLNL/lustre/releases/tag/2.15.1_7.llnl

      For my reference, my local ticket is TOSS5826

      Attachments

        1. dk.mutt4.1.gz
          33 kB
        2. dk.mutt4.2.gz
          256 kB
        3. dk.mutt4.3.gz
          57 kB
        4. dmesg.mutt4.1667256190.gz
          32 kB
        5. dmesg.mutt4.1667259716.gz
          0.6 kB
        6. lnetctl.peer.show.mutt4.1.gz
          1 kB

        Issue Links

          Activity

            [LU-16283] o2iblnd.c:3049:kiblnd_shutdown() <NID>: waiting for <N> peers to disconnect

            Hi Olaf,

            I haven't been able to conclusively identify the problem yet. I believe it has to do with some sort of race on LNet shutdown, but this is kind of obvious. The workaround you applied should be good for most cases, the only scenario it doesn't cover is probably when active router NI's are being brought down/up dynamically. 

            Thanks,

            Serguei.

            ssmirnov Serguei Smirnov added a comment - Hi Olaf, I haven't been able to conclusively identify the problem yet. I believe it has to do with some sort of race on LNet shutdown, but this is kind of obvious. The workaround you applied should be good for most cases, the only scenario it doesn't cover is probably when active router NI's are being brought down/up dynamically.  Thanks, Serguei.
            ofaaland Olaf Faaland added a comment -

            Hi Serguei,
            I've added "lnetctl set routing 0" to our lnet service file. Have you had any success identifying the problem? Thanks

            ofaaland Olaf Faaland added a comment - Hi Serguei, I've added "lnetctl set routing 0" to our lnet service file. Have you had any success identifying the problem? Thanks
            ofaaland Olaf Faaland added a comment -

            Yes, we are adding "lnetctl set routing 0" to the shutdown tasks in our lnet service file after the holiday break.

            ofaaland Olaf Faaland added a comment - Yes, we are adding "lnetctl set routing 0" to the shutdown tasks in our lnet service file after the holiday break.

            Hi Olaf,

            Sorry, not yet. It is not addressing the root cause, but, for the lack of better ideas, I was considering changing the shutdown procedure to include "lnetctl set routing 0", but haven't submitted the patch yet.

            Thanks,

            Serguei.

            ssmirnov Serguei Smirnov added a comment - Hi Olaf, Sorry, not yet. It is not addressing the root cause, but, for the lack of better ideas, I was considering changing the shutdown procedure to include "lnetctl set routing 0", but haven't submitted the patch yet. Thanks, Serguei.

            Hi Serguei,

            Do you have any update on this issue?

            Thanks

            ofaaland Olaf Faaland added a comment - Hi Serguei, Do you have any update on this issue? Thanks
            ofaaland Olaf Faaland added a comment -

            > I experimented with executing "lnetctl set routing 0" on the router node

            Good idea.   Doing this before "lnetctl net unconfigure" prevents the hang in kiblnd_shutdown(), thanks.

            ofaaland Olaf Faaland added a comment - > I experimented with executing "lnetctl set routing 0" on the router node Good idea.   Doing this before "lnetctl net unconfigure" prevents the hang in kiblnd_shutdown(), thanks.

            Hi Olaf,

            It looks like I'm able to reproduce the issue using similar setup. I was using two routers, routing between ib and tcp networks, and lnet_selftest to generate traffic between the ib server and the tcp client.

            I should be able to use this to look further into fixing this properly. In the meantime though, I experimented with executing "lnetctl set routing 0" on the router node before running "lustre_rmmod" on it, which seems to prevent it from getting stuck. I wonder if you can give this extra step a try to see if it helps in your case, too, as a kind of temporary workaround.

            Thanks,

            Serguei.

            ssmirnov Serguei Smirnov added a comment - Hi Olaf, It looks like I'm able to reproduce the issue using similar setup. I was using two routers, routing between ib and tcp networks, and lnet_selftest to generate traffic between the ib server and the tcp client. I should be able to use this to look further into fixing this properly. In the meantime though, I experimented with executing "lnetctl set routing 0" on the router node before running "lustre_rmmod" on it, which seems to prevent it from getting stuck. I wonder if you can give this extra step a try to see if it helps in your case, too, as a kind of temporary workaround. Thanks, Serguei.

            Hi Serguei,

            I performed a test, with https://review.whamcloud.com/46711, and still see "waiting for 1 peers to disconnect".

            My reproducer:
            1. Start a lustre file system on garter[1-8], on o2ib100 (mlx)
            2. Start LNet on 4 routers, mutt[1-4], on o2ib100 and o2ib44 (opa)
            3. Mount the file system on 64 clients on o2ib44, which reach garter through mutt[1-4]
            4. Start a 64-node 512-task IOR on the clients, writing to all the OSTs
            5. Run "systemctl stop lnet" on mutt3
            6. I observe "lnetctl lnet unconfigure" is hung as originally reported, and the stack is the same. The console log for mutt3 shows "waiting for 1 peers to disconnect" repeatedly

            Just to be sure, note that we are not using MR.

            thanks,
            Olaf

            ofaaland Olaf Faaland added a comment - Hi Serguei, I performed a test, with https://review.whamcloud.com/46711 , and still see "waiting for 1 peers to disconnect". My reproducer: 1. Start a lustre file system on garter [1-8] , on o2ib100 (mlx) 2. Start LNet on 4 routers, mutt [1-4] , on o2ib100 and o2ib44 (opa) 3. Mount the file system on 64 clients on o2ib44, which reach garter through mutt [1-4] 4. Start a 64-node 512-task IOR on the clients, writing to all the OSTs 5. Run "systemctl stop lnet" on mutt3 6. I observe "lnetctl lnet unconfigure" is hung as originally reported, and the stack is the same. The console log for mutt3 shows "waiting for 1 peers to disconnect" repeatedly Just to be sure, note that we are not using MR. thanks, Olaf

            Thanks, Serguei. I hope to test it tomorrow.

            ofaaland Olaf Faaland added a comment - Thanks, Serguei. I hope to test it tomorrow.

            Hi Olaf,

            On my local setup, using b2_15 and the steps-to-reproduce from the earlier comment, it appears that https://review.whamcloud.com/46711 is able to fix the issue with getting stuck on shutdown.

            On the other hand, on master branch checking out the commit immediately before this fix causes the issue to appear.

            Even though my reproducer is different, I think it is a good candidate to try in your environment. 

            Thanks,

            Serguei.

            ssmirnov Serguei Smirnov added a comment - Hi Olaf, On my local setup, using b2_15 and the steps-to-reproduce from the earlier comment, it appears that https://review.whamcloud.com/46711 is able to fix the issue with getting stuck on shutdown. On the other hand, on master branch checking out the commit immediately before this fix causes the issue to appear. Even though my reproducer is different, I think it is a good candidate to try in your environment.  Thanks, Serguei.
            ofaaland Olaf Faaland added a comment -

            Hi Serguei,

            Herre are the rest of the sysctls:

            net.ipv4.conf.all.arp_announce = 0
            net.ipv4.conf.all.arp_filter = 1
            net.ipv4.conf.all.arp_ignore = 0
            net.ipv4.conf.all.rp_filter = 1
            net.ipv4.conf.default.arp_announce = 0
            net.ipv4.conf.default.arp_filter = 0
            net.ipv4.conf.default.arp_ignore = 0
            net.ipv4.conf.default.rp_filter = 1
            

            In my case, I have only one LNet NI per network. Each router node has 2 OPA links (called hsi[01], one not configured in LNet) and one IB link (called san0). In case it helps:

            [root@mutt4:~]# lnetctl net show
            net:
                - net type: lo
                  local NI(s):
                    - nid: 0@lo
                      status: up
                - net type: o2ib44
                  local NI(s):
                    - nid: 192.168.128.4@o2ib44
                      status: up
                      interfaces:
                          0: hsi0
                - net type: o2ib100
                  local NI(s):
                    - nid: 172.19.1.108@o2ib100
                      status: up
                      interfaces:
                          0: san0
            
            [root@mutt4:~]# ibstat | grep -w -e CA -e State -e Physical -e Firmware
            CA 'hfi1_0'
                    CA type: 
                    Firmware version: 1.27.0
                            State: Active
                            Physical state: LinkUp
            CA 'hfi1_1'
                    CA type: 
                    Firmware version: 1.27.0
                            State: Active
                            Physical state: LinkUp
            CA 'mlx5_0'
                    CA type: MT4123
                    Firmware version: 20.32.2004
                            State: Active
                            Physical state: LinkUp
            CA 'mlx5_bond_0'
                    CA type: MT4125
                    Firmware version: 22.32.2004
                            State: Active
                            Physical state: LinkUp
            
            ofaaland Olaf Faaland added a comment - Hi Serguei, Herre are the rest of the sysctls: net.ipv4.conf.all.arp_announce = 0 net.ipv4.conf.all.arp_filter = 1 net.ipv4.conf.all.arp_ignore = 0 net.ipv4.conf.all.rp_filter = 1 net.ipv4.conf.default.arp_announce = 0 net.ipv4.conf.default.arp_filter = 0 net.ipv4.conf.default.arp_ignore = 0 net.ipv4.conf.default.rp_filter = 1 In my case, I have only one LNet NI per network. Each router node has 2 OPA links (called hsi [01] , one not configured in LNet) and one IB link (called san0). In case it helps: [root@mutt4:~]# lnetctl net show net: - net type: lo local NI(s): - nid: 0@lo status: up - net type: o2ib44 local NI(s): - nid: 192.168.128.4@o2ib44 status: up interfaces: 0: hsi0 - net type: o2ib100 local NI(s): - nid: 172.19.1.108@o2ib100 status: up interfaces: 0: san0 [root@mutt4:~]# ibstat | grep -w -e CA -e State -e Physical -e Firmware CA 'hfi1_0' CA type: Firmware version: 1.27.0 State: Active Physical state: LinkUp CA 'hfi1_1' CA type: Firmware version: 1.27.0 State: Active Physical state: LinkUp CA 'mlx5_0' CA type: MT4123 Firmware version: 20.32.2004 State: Active Physical state: LinkUp CA 'mlx5_bond_0' CA type: MT4125 Firmware version: 22.32.2004 State: Active Physical state: LinkUp

            People

              ssmirnov Serguei Smirnov
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: