Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12906

LBUG ASSERTION( rspt->rspt_cpt == cpt ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.12.3
    • None
    • CentOS 7.6
    • 2
    • 9223372036854775807

    Description

      Using 2.12.3 servers and clients, we hit this bug once on an OSS, once on a client:

      [197803.220678] LNetError: 34981:0:(lib-move.c:2729:lnet_detach_rsp_tracker()) ASSERTION( rspt->rspt_cpt == cpt ) failed: 
      [197803.220682] LNetError: 34981:0:(lib-move.c:2729:lnet_detach_rsp_tracker()) LBUG
      [197803.220684] Pid: 34981, comm: kiblnd_sd_01_00 3.10.0-957.27.2.el7.x86_64 #1 SMP Mon Jul 29 17:46:05 UTC 2019
      [197803.220684] Call Trace:
      [197803.220706]  [<ffffffffc0e777cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [197803.220712]  [<ffffffffc0e7787c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [197803.220730]  [<ffffffffc0f1849b>] lnet_detach_rsp_tracker+0x5b/0x60 [lnet]
      [197803.220739]  [<ffffffffc0f08d3a>] lnet_finalize+0x72a/0x9a0 [lnet]
      [197803.220748]  [<ffffffffc0f12a51>] lnet_post_send_locked+0x751/0x9c0 [lnet]
      [197803.220757]  [<ffffffffc0f149a8>] lnet_return_tx_credits_locked+0x2a8/0x490 [lnet]
      [197803.220765]  [<ffffffffc0f075ec>] lnet_msg_decommit+0xec/0x700 [lnet]
      [197803.220772]  [<ffffffffc0f089b7>] lnet_finalize+0x3a7/0x9a0 [lnet]
      [197803.220783]  [<ffffffffc14c861d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd]
      [197803.220789]  [<ffffffffc14d3b0d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd]
      [197803.220793]  [<ffffffff8d4c2e81>] kthread+0xd1/0xe0
      [197803.220797]  [<ffffffff8db76c37>] ret_from_fork_nospec_end+0x0/0x39
      [197803.220822]  [<ffffffffffffffff>] 0xffffffffffffffff
      [197803.220823] Kernel panic - not syncing: LBUG
      [197803.220826] CPU: 37 PID: 34981 Comm: kiblnd_sd_01_00 Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.27.2.el7.x86_64 #1
      [197803.220827] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.9.1 12/04/2018
      [197803.220828] Call Trace:
      [197803.220833]  [<ffffffff8db64147>] dump_stack+0x19/0x1b
      [197803.220835]  [<ffffffff8db5d850>] panic+0xe8/0x21f
      [197803.220842]  [<ffffffffc0e778cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [197803.220851]  [<ffffffffc0f1849b>] lnet_detach_rsp_tracker+0x5b/0x60 [lnet]
      [197803.220860]  [<ffffffffc0f08d3a>] lnet_finalize+0x72a/0x9a0 [lnet]
      [197803.220863]  [<ffffffff8d502372>] ? ktime_get_ts64+0x52/0xf0
      [197803.220872]  [<ffffffffc0f12a51>] lnet_post_send_locked+0x751/0x9c0 [lnet]
      [197803.220880]  [<ffffffffc0f149a8>] lnet_return_tx_credits_locked+0x2a8/0x490 [lnet]
      [197803.220888]  [<ffffffffc0f075ec>] lnet_msg_decommit+0xec/0x700 [lnet]
      [197803.220896]  [<ffffffffc0f089b7>] lnet_finalize+0x3a7/0x9a0 [lnet]
      [197803.220901]  [<ffffffffc14c861d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd]
      [197803.220906]  [<ffffffffc14d3b0d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd]
      [197803.220908]  [<ffffffff8d4e220e>] ? dequeue_task_fair+0x41e/0x660
      [197803.220911]  [<ffffffff8d42a59e>] ? __switch_to+0xce/0x580
      [197803.220913]  [<ffffffff8d4d7c40>] ? wake_up_state+0x20/0x20
      [197803.220919]  [<ffffffffc14d3270>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd]
      [197803.220920]  [<ffffffff8d4c2e81>] kthread+0xd1/0xe0
      [197803.220923]  [<ffffffff8d4c2db0>] ? insert_kthread_work+0x40/0x40
      [197803.220925]  [<ffffffff8db76c37>] ret_from_fork_nospec_begin+0x21/0x21
      [197803.220927]  [<ffffffff8d4c2db0>] ? insert_kthread_work+0x40/0x40
      

      We're using Mellanox OFED 4.7 on servers / routers / this client (not all clients have been upgraded yet).

      We can provide crash dumps upon request. This seems to be a new problem either with 2.12.3 vs 2.12.0 or with Mellanox OFED 4.7 vs 4.5.

      Thanks!
      Stephane

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              sthiell Stephane Thiell
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: