Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16986

LNetError: 26687:0:(o2iblnd.c:992:kiblnd_destroy_conn()) ASSERTION( conn->ibc_nsends_posted == 0 ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.15.3
    • None
    • CentOS 7.9 3.10.0-1160.90.1.el7_lustre.pl1.x86_64
    • 3
    • 9223372036854775807

    Description

      With Lustre 2.15.3 on clients (Sherlock) and servers (Fir), we hit this assertion on a Lustre OSS (fir-io2-s1) while we were redeploying LNet routers from 2.15.2 to 2.15.3. We did it slowly but still, I believe by redeploying the routers we put some stress on LNet and triggered this particular condition. I do have a vmcore available upon request.

      [1927361.489487] LNetError: 26687:0:(o2iblnd.c:992:kiblnd_destroy_conn()) ASSERTION( conn->ibc_nsends_posted == 0 ) failed:
      [1927361.500446] LNetError: 26687:0:(o2iblnd.c:992:kiblnd_destroy_conn()) LBUG
      [1927361.507427] Pid: 26687, comm: kiblnd_connd 3.10.0-1160.90.1.el7_lustre.pl1.x86_64 #1 SMP Tue Jun 20 15:47:49 PDT 2023
      [1927361.518344] Call Trace:
      [1927361.520991] [<0>] libcfs_call_trace+0x90/0xf0 [libcfs]
      [1927361.526319] [<0>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [1927361.531308] [<0>] kiblnd_destroy_conn+0x476/0x650 [ko2iblnd]
      [1927361.537253] [<0>] kiblnd_connd+0xfa/0xcb0 [ko2iblnd]
      [1927361.542393] [<0>] kthread+0xd1/0xe0
      [1927361.546067] [<0>] ret_from_fork_nospec_begin+0x7/0x21
      [1927361.551328] [<0>] 0xfffffffffffffffe
      [1927361.555087] Kernel panic - not syncing: LBUG
      [1927361.559528] CPU: 59 PID: 26687 Comm: kiblnd_connd Kdump: loaded Tainted: G           OE  ------------   3.10.0-1160.90.1.el7_lustre.pl1.x86_64 #1
      [1927361.572724] Hardware name: Dell Inc. PowerEdge R6525/0N7YGH, BIOS 2.11.3 02/24/2023
      [1927361.580548] Call Trace:
      [1927361.583167]  [<ffffffff985b1bec>] dump_stack+0x19/0x1f
      [1927361.588478]  [<ffffffff985ab708>] panic+0xe8/0x21f
      [1927361.593445]  [<ffffffffc092f5eb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [1927361.599796]  [<ffffffffc0a0ca36>] kiblnd_destroy_conn+0x476/0x650 [ko2iblnd]
      [1927361.607016]  [<ffffffffc0a1e72a>] kiblnd_connd+0xfa/0xcb0 [ko2iblnd]
      [1927361.613541]  [<ffffffff97ecc790>] ? wake_up_atomic_t+0x40/0x40
      [1927361.619546]  [<ffffffffc0a1e630>] ? kiblnd_cm_callback+0x2140/0x2140 [ko2iblnd]
      [1927361.627023]  [<ffffffff97ecb621>] kthread+0xd1/0xe0
      [1927361.632074]  [<ffffffff97ecb550>] ? insert_kthread_work+0x40/0x40
      [1927361.638774]  [<ffffffff985c51dd>] ret_from_fork_nospec_begin+0x7/0x21
      [1927361.645387]  [<ffffffff97ecb550>] ? insert_kthread_work+0x40/0x40
      

      Attaching vmcore-dmesg.txt as fir-io2-s1-20230724-vmcore-dmesg.txt

      Attachments

        Activity

          People

            wc-triage WC Triage
            sthiell Stephane Thiell
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: