Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16283

o2iblnd.c:3049:kiblnd_shutdown() <NID>: waiting for <N> peers to disconnect

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • Lustre 2.15.1
    • None
    • 4.18.0-372.32.1.1toss.t4.x86_64
      lustre-2.15.1_7.llnl-2.t4.x86_64
    • 3
    • 9223372036854775807

      Observed on a lustre router node, while the servers and some of the clients were up and connected. The luster router node has Omnipath on the client side and IB on the lustre server side.

      lnetctl lnet unconfigure 

      hangs with stack

      [<0>] kiblnd_shutdown+0x347/0x4e0 [ko2iblnd]
      [<0>] lnet_shutdown_lndni+0x2b6/0x4c0 [lnet]
      [<0>] lnet_shutdown_lndnet+0x6c/0xb0 [lnet]
      [<0>] lnet_shutdown_lndnets+0x11e/0x300 [lnet]
      [<0>] LNetNIFini+0xb7/0x130 [lnet]
      [<0>] lnet_ioctl+0x220/0x260 [lnet]
      [<0>] notifier_call_chain+0x47/0x70
      [<0>] blocking_notifier_call_chain+0x42/0x60
      [<0>] libcfs_psdev_ioctl+0x346/0x590 [libcfs]
      [<0>] do_vfs_ioctl+0xa5/0x740
      [<0>] ksys_ioctl+0x64/0xa0
      [<0>] __x64_sys_ioctl+0x16/0x20
      [<0>] do_syscall_64+0x5b/0x1b0
      [<0>] entry_SYSCALL_64_after_hwframe+0x61/0xc6 

      Debug log shows it's waiting for 3 peers, even after 3700 seconds:

      00000800:00000200:1.0:1667256015.359743:0:35023:0:(o2iblnd.c:3049:kiblnd_shutdown()) 172.19.1.108@o2ib100: waiting for 3 peers to disconnect 
      ...
      00000800:00000200:3.0:1667259799.039743:0:35023:0:(o2iblnd.c:3049:kiblnd_shutdown()) 172.19.1.108@o2ib100: waiting for 3 peers to disconnect

      Before the shutdown there were 38 peers, all reported as "up"

      For patch stack, see https://github.com/LLNL/lustre/releases/tag/2.15.1_7.llnl

      For my reference, my local ticket is TOSS5826

        1. dk.mutt4.1.gz
          33 kB
        2. dk.mutt4.2.gz
          256 kB
        3. dk.mutt4.3.gz
          57 kB
        4. dmesg.mutt4.1667256190.gz
          32 kB
        5. dmesg.mutt4.1667259716.gz
          0.6 kB
        6. lnetctl.peer.show.mutt4.1.gz
          1 kB

            ssmirnov Serguei Smirnov
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: