Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19310

Interop crash in sanity-lnet test 256 lnet_return_rx_credits_locked()) ASSERTION( msg->msg_kiov != ((void *)0) ) failed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Medium
    • None
    • Lustre 2.17.0, Lustre 2.16.1
    • None
    • 3
    • 9223372036854775807

    Description

      Looks like recent lnet landings (I think LU-15135 and LU-18555) broke some things in interop

      assertion failure in sanity-lnet test 256:

      [23676.200412] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Disable routing on trevis-107vm11
      [23676.516990] Lustre: DEBUG MARKER: Disable routing on trevis-107vm11
      [23676.752477] Lustre: DEBUG MARKER: /usr/sbin/lnetctl set routing 0
      [23676.935020] LNetError: 817662:0:(lib-move.c:1199:lnet_return_rx_credits_locked()) ASSERTION( msg->msg_kiov != ((void *)0) ) failed:
      [23676.935070] LNetError: 817662:0:(lib-move.c:1199:lnet_return_rx_credits_locked()) LBUG
      [23676.935079] CPU: 0 PID: 817662 Comm: lnetctl Kdump: loaded Tainted: G           OE     -------  ---  5.14.0-427.31.1_lustre.el9.x86_64 #1
      [23676.935082] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [23676.935087] Call Trace:
      [23676.935100]  <TASK>
      [23676.935103]  dump_stack_lvl+0x34/0x48
      [23676.935154]  lbug_with_loc.cold+0x5/0x58 [libcfs]
      [23676.935202]  lnet_return_rx_credits_locked+0x437/0x600 [lnet]
      [23676.935260]  lnet_msg_decommit_rx+0xcb/0x2c0 [lnet]
      [23676.935299]  lnet_msg_decommit+0x79/0x200 [lnet]
      [23676.935336]  lnet_complete_msg_locked+0x33/0x240 [lnet]
      [23676.935372]  lnet_finalize+0xff/0x260 [lnet]
      [23676.935409]  lnet_drop_routed_msgs_locked+0xa6/0xd0 [lnet]
      [23676.935447]  lnet_rtrpool_free_bufs+0xaa/0x160 [lnet]
      [23676.935492]  lnet_rtrpools_free.part.0+0x46/0x90 [lnet]
      [23676.935531]  LNetCtl+0xc23/0x1d00 [lnet]
      [23676.935570]  ? mutex_lock+0xe/0x30
      [23676.935599]  ? LNetNIInit+0x237/0x600 [lnet]
      [23676.935632]  lnet_ioctl+0x23e/0x300 [lnet]
      [23676.935671]  ? lnet_ioctl_getdata+0x147/0x790 [lnet]
      [23676.935709]  lnet_psdev_ioctl+0x353/0x4c0 [lnet]
      [23676.935748]  __x64_sys_ioctl+0x8a/0xc0
      [23676.935780]  do_syscall_64+0x5c/0x90
      [23676.935790]  ? exc_page_fault+0x62/0x150
      [23676.935793]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
      [23676.935804] RIP: 0033:0x7f6a1490357b
      [23676.935862] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 68 0f 00 f7 d8 64 89 01 48
      [23676.935864] RSP: 002b:00007ffd92712068 EFLAGS: 00000202 ORIG_RAX: 0000000000000010
      [23676.935877] RAX: ffffffffffffffda RBX: 00007f6a14bf2fc0 RCX: 00007f6a1490357b
      [23676.935878] RDX: 00007ffd92712120 RSI: 00000000c0b86557 RDI: 0000000000000003
      [23676.935879] RBP: 00000000c0b86557 R08: 00007ffd92712120 R09: 0000000000000000
      [23676.935880] R10: 00007f6a14811d78 R11: 0000000000000202 R12: 00007ffd92712120
      [23676.935881] R13: 00007f6a14bd136e R14: 00007ffd927120a0 R15: 00007f6a14c31000
      [23676.935884]  </TASK>

      https://testing.whamcloud.com/test_sets/2a64bccf-e4b9-43f5-b384-c17ef711e31e

      https://testing.whamcloud.com/test_sets/080a66f5-c061-40ad-8ac7-59cdece7416b

      https://testing.whamcloud.com/test_sets/20637603-a12a-4e3c-a5ab-9baa1a02f02a

       

      A similar crash in 2.15 sanity-lnet test 257:

       [ 2484.224447] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Disable routing on onyx-150vm12
      [ 2484.758406] Lustre: DEBUG MARKER: Disable routing on onyx-150vm12
      [ 2485.065750] Lustre: DEBUG MARKER: /usr/sbin/lnetctl set routing 0
      [ 2485.270108] LNetError: 255122:0:(lib-move.c:1205:lnet_return_rx_credits_locked()) ASSERTION( msg->msg_kiov != ((void *)0) ) failed:
      [ 2485.272442] LNetError: 255122:0:(lib-move.c:1205:lnet_return_rx_credits_locked()) LBUG
      [ 2485.273980] Pid: 255122, comm: lnetctl 4.18.0-553.53.1.el8_lustre.x86_64 #1 SMP Tue Jun 10 20:38:47 UTC 2025
      [ 2485.275744] Call Trace TBD:
      [ 2485.276401] [<0>] libcfs_call_trace+0x63/0x90 [libcfs]
      [ 2485.277346] [<0>] lbug_with_loc+0x3f/0x70 [libcfs]
      [ 2485.278204] [<0>] lnet_return_rx_credits_locked+0x3c7/0x570 [lnet]
      [ 2485.279372] [<0>] lnet_msg_decommit+0x288/0x760 [lnet]
      [ 2485.280309] [<0>] lnet_finalize+0x3b2/0xa70 [lnet]
      [ 2485.281182] [<0>] lnet_drop_routed_msgs_locked+0xac/0xe0 [lnet]
      [ 2485.282236] [<0>] lnet_rtrpool_free_bufs+0xaa/0x160 [lnet]
      [ 2485.283227] [<0>] lnet_rtrpools_free+0x52/0x90 [lnet]
      [ 2485.284141] [<0>] LNetCtl+0x13d1/0x1bb0 [lnet]
      [ 2485.284963] [<0>] lnet_ioctl+0xa8/0x260 [lnet]
      [ 2485.285796] [<0>] notifier_call_chain+0x47/0x70
      [ 2485.286652] [<0>] blocking_notifier_call_chain+0x42/0x60
      [ 2485.287615] [<0>] libcfs_psdev_ioctl+0x34a/0x590 [libcfs]
      [ 2485.288579] [<0>] do_vfs_ioctl+0xa4/0x690
      [ 2485.289329] [<0>] ksys_ioctl+0x64/0xa0
      [ 2485.290023] [<0>] __x64_sys_ioctl+0x16/0x20
      [ 2485.290791] [<0>] do_syscall_64+0x5b/0x1a0
      [ 2485.291551] [<0>] entry_SYSCALL_64_after_hwframe+0x66/0xcb
      [ 2485.292547] Kernel panic - not syncing: LBUG

      https://testing.whamcloud.com/test_sets/c34290ae-f8b9-40c4-a3d0-1bf43b57ece6

       

       

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              green Oleg Drokin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: