Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12628

LNetError: 44030:0:(lib-msg.c:735:lnet_health_check()) ASSERTION( msg->msg_tx_committed ) failed:

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.13.0, Lustre 2.12.3
    • None
    • 3
    • 9223372036854775807

    Description

      It's possible for a message with

      msg_rx_committed

      to reach the resend block in lnet_health_check() and trip this assert. The assert should be changed to an if-statement and we should simply return -1 to finalize the message.

      [846578.191198] LustreError: 65272:0:(brw_test.c:389:brw_server_rpc_done()) Skipped 12 previous similar messages
      [846578.191789] LNetError: 44030:0:(lib-msg.c:735:lnet_health_check()) ASSERTION( msg->msg_tx_committed ) failed: 
      [846578.191793] LNetError: 44030:0:(lib-msg.c:735:lnet_health_check()) LBUG
      [846578.191795] Pid: 44030, comm: kiblnd_sd_01_00 3.10.0-693.21.1.x3.2.152.x86_64 #1 SMP Mon Feb 25 06:44:43 PST 2019
      [846578.191795] Call Trace:
      [846578.191824]  [<ffffffff8103a212>] save_stack_trace_tsk+0x22/0x40
      [846578.191856]  [<ffffffffc0a3f7cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [846578.191868]  [<ffffffffc0a3f87c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [846578.191915]  [<ffffffffc0ad2c7e>] lnet_health_check+0x9ae/0x9e0 [lnet]
      [846578.191930]  [<ffffffffc0ad2dc5>] lnet_finalize+0x115/0x9c0 [lnet]
      [846578.191949]  [<ffffffffc0b8278d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd]
      [846578.191958]  [<ffffffffc0b8dc5d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd]
      [846578.191965]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      [846578.191972]  [<ffffffff816c455d>] ret_from_fork+0x5d/0xb0
      [846578.192031]  [<ffffffffffffffff>] 0xffffffffffffffff
      [846578.192032] Kernel panic - not syncing: LBUG
      [846578.192037] CPU: 13 PID: 44030 Comm: kiblnd_sd_01_00 Tainted: P           OE  ------------   3.10.0-693.21.1.x3.2.152.x86_64 #1
      [846578.192038] Hardware name: Seagate SATI-TL/Type2 - Board Product Sati2, BIOS SATI-TL.v0046.0002 01/13/2015
      [846578.192040] Call Trace:
      [846578.192053]  [<ffffffff816b17c8>] dump_stack+0x19/0x1b
      [846578.192057]  [<ffffffff816ab634>] panic+0xe8/0x21f
      [846578.192072]  [<ffffffffc0a3f8cb>] lbug_with_loc+0x9b/0xa0 [libcfs]
      [846578.192087]  [<ffffffffc0ad2c7e>] lnet_health_check+0x9ae/0x9e0 [lnet]
      [846578.192094]  [<ffffffff810eced2>] ? ktime_get_ts64+0x52/0xf0
      [846578.192110]  [<ffffffffc0ad2dc5>] lnet_finalize+0x115/0x9c0 [lnet]
      [846578.192119]  [<ffffffffc0b78b52>] ? kiblnd_pool_free_node+0x82/0x170 [ko2iblnd]
      [846578.192126]  [<ffffffffc0b8278d>] kiblnd_tx_done+0x10d/0x3e0 [ko2iblnd]
      [846578.192135]  [<ffffffffc0b8dc5d>] kiblnd_scheduler+0x89d/0x1180 [ko2iblnd]
      [846578.192142]  [<ffffffff810cb0c5>] ? sched_clock_cpu+0x85/0xc0
      [846578.192146]  [<ffffffff8102954d>] ? __switch_to+0xcd/0x500
      [846578.192149]  [<ffffffff810c7c80>] ? wake_up_state+0x20/0x20
      [846578.192156]  [<ffffffffc0b8d3c0>] ? kiblnd_cq_event+0x90/0x90 [ko2iblnd]
      [846578.192159]  [<ffffffff810b4031>] kthread+0xd1/0xe0
      [846578.192161]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      [846578.192164]  [<ffffffff816c455d>] ret_from_fork+0x5d/0xb0
      [846578.192167]  [<ffffffff810b3f60>] ? insert_kthread_work+0x40/0x40
      

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: