Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17496

LNet teardown could retry cleanup before asserting

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      LNet teardown could retry cleanup before asserting.

      We see this assert show up in sanity-lnet/220

      Excerpted from https://testing.whamcloud.com/test_logs/087d6d3d-deca-4831-9337-30fae7338f25/show_text

      [17841.535068] Lustre: DEBUG MARKER: == sanity-lnet test 220: Add routes w/default options - check aliveness ========================================================== 23:19:06 (1706570346)
      [17841.835785] Lustre: DEBUG MARKER: /usr/sbin/lustre_rmmod
      [17842.279424] Key type lgssc unregistered
      [17842.319629] LNetError: 6049:0:(lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed: 
      [17842.320935] LNetError: 6049:0:(lib-md.c:281:lnet_assert_handler_unused()) LBUG
      [17842.321757] Pid: 6049, comm: lnet_discovery 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023
      [17842.322978] Call Trace TBD:
      [17842.323365] Kernel panic - not syncing: LBUG
      [17842.323894] CPU: 0 PID: 6049 Comm: lnet_discovery Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
      [17842.325176] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      [17842.325838] Call Trace:
      [17842.326178]  <TASK>
      [17842.326492]  dump_stack_lvl+0x34/0x48
      [17842.326997]  panic+0xf4/0x2c6
      [17842.327399]  ? lnet_discovery_event_reply+0xbc0/0xbc0 [lnet]
      [17842.328223]  lbug_with_loc.cold+0x18/0x18 [libcfs]
      [17842.328869]  lnet_assert_handler_unused+0x9c/0xd0 [lnet]
      [17842.329506]  lnet_peer_discovery+0x997/0xaf0 [lnet]
      [17842.330111]  ? cpuacct_percpu_seq_show+0x10/0x10
      [17842.330680]  ? lnet_peer_data_present+0x580/0x580 [lnet]
      [17842.331323]  kthread+0xd9/0x100
      [17842.331734]  ? kthread_complete_and_exit+0x20/0x20
      [17842.332298]  ret_from_fork+0x22/0x30
      [17842.332769]  </TASK>
      

      We could attempt to retry the clean pass a couple of times before finally asserting.

      Attachments

        Activity

          People

            stancheff Shaun Tancheff
            stancheff Shaun Tancheff
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: