Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14810

sanity-lnet test_212: lnet_assert_handler_unused() ASSERTION(md->md_handler != handler) failed

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Lustre 2.17.0
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/84ed5e32-bda8-4880-a254-dbd96cb6a478

      test_212 failed with the following error:

      trevis-79vm3 crashed during sanity-lnet test_212
      
      LNetError: 944562:0:(lib-md.c:288:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed: 
      Pid: 944562, comm: lnet_discovery 4.18.0-240.22.1.el8_3.aarch64 #1 SMP Thu Apr 8 19:01:45 UTC 2021
      Call Trace:
       libcfs_call_trace+0xb8/0x118 [libcfs]
       lbug_with_loc+0x60/0xa0 [libcfs]
       lnet_assert_handler_unused+0xb8/0xe0 [lnet]
       lnet_peer_discovery+0x16c4/0x1cb0 [lnet]
       kthread+0x130/0x138
      

      This test was recently landed via patch https://review.whamcloud.com/43418 "LU-14627 lnet: Ensure ref taken when queueing for discovery"

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-lnet test_212 - trevis-79vm3 crashed during sanity-lnet test_212

      Attachments

        Issue Links

          Activity

            [LU-14810] sanity-lnet test_212: lnet_assert_handler_unused() ASSERTION(md->md_handler != handler) failed
            gerrit Gerrit Updater added a comment -

            "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/59815
            Subject: LU-14810 lnet: Avoid multiple PUSH to same peer
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 107592f4c9c15e83bfa589fd035055a71eb242b9

            gerrit Gerrit Updater added a comment - "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/59815 Subject: LU-14810 lnet: Avoid multiple PUSH to same peer Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 107592f4c9c15e83bfa589fd035055a71eb242b9
            hornc Chris Horn added a comment - +1 on master https://testing.whamcloud.com/test_sets/ec596cc8-2bd5-41a8-a098-821566f270e4
            ys Yang Sheng added a comment -

            +1. https://testing.whamcloud.com/test_sets/5b314583-911e-4f74-abba-3590e5e66bd0

            [18631.568718] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Wait for 1080568
            [18631.791373] Lustre: DEBUG MARKER: Wait for 1080568
            [18633.346663] LNet: There was an unexpected network error while writing to 10.240.25.244: rc = -22
            [18633.596010] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Finished wait on 1080568
            [18633.871079] Lustre: DEBUG MARKER: Finished wait on 1080568
            [18633.891459] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&
            lctl dl | grep ' ST ' || true
            [18633.922010] LNetError: 1081165:0:(lib-md.c:259:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed: 
            [18633.922014] LNetError: 1081165:0:(lib-md.c:259:lnet_assert_handler_unused()) LBUG
            [18633.922027] CPU: 0 PID: 1081165 Comm: lnetctl Kdump: loaded Tainted: G        W  OE     -------  ---  5.14.0-362.24.1.el9_3.x86_64 #1
            [18633.922030] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [18633.922034] Call Trace:
            [18633.922053]  <TASK>
            [18633.922057]  dump_stack_lvl+0x34/0x48
            [18633.922111]  lbug_with_loc.cold+0x5/0x58 [libcfs]
            [18633.922131]  lnet_assert_handler_unused+0x9c/0xd0 [lnet]
            [18633.922195]  ? __pfx_lnet_discovery_event_handler+0x10/0x10 [lnet]
            [18633.922231]  LNetNIFini+0x9f/0x150 [lnet]
            [18633.922261]  lnet_unconfigure+0x66/0x80 [lnet]
            [18633.922297]  genl_family_rcv_msg_doit.isra.0+0xcb/0x120
            [18633.922330]  genl_family_rcv_msg+0x14c/0x220
            [18633.922333]  ? __pfx_lnet_net_conf_cmd+0x10/0x10 [lnet]
            [18633.922357]  genl_rcv_msg+0x47/0xa0
            [18633.922361]  ? __pfx_genl_rcv_msg+0x10/0x10
            [18633.922363]  netlink_rcv_skb+0x57/0x100
            [18633.922367]  genl_rcv+0x24/0x40
            [18633.922370]  netlink_unicast+0x23e/0x360
            [18633.922372]  netlink_sendmsg+0x238/0x480
            [18633.922374]  ? __check_object_size.part.0+0x35/0xd0
            [18633.922401]  sock_sendmsg+0x62/0x70
            [18633.922422]  ____sys_sendmsg+0x230/0x270
            [18633.922424]  ? copy_msghdr_from_user+0x6d/0xa0
            [18633.922427]  ___sys_sendmsg+0x88/0xd0
            [18633.922433]  ? ___sys_recvmsg+0x88/0xd0
            [18633.922436]  __sys_sendmsg+0x59/0xa0
            [18633.922438]  do_syscall_64+0x5c/0x90
            [18633.922463]  ? syscall_exit_work+0x103/0x130
            [18633.922481]  ? syscall_exit_to_user_mode+0x22/0x40
            [18633.922484]  ? do_syscall_64+0x69/0x90
            [18633.922485]  ? do_syscall_64+0x69/0x90
            [18633.922487]  ? exc_page_fault+0x62/0x150
            [18633.922489]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
            [18633.922505] RIP: 0033:0x7fc79b94f787
            [18633.922558] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10
            [18633.922560] RSP: 002b:00007ffe57304698 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
            [18633.922566] RAX: ffffffffffffffda RBX: 0000556d5c0e3390 RCX: 00007fc79b94f787
            [18633.922567] RDX: 0000000000000000 RSI: 00007ffe573046d0 RDI: 0000000000000003
            [18633.922569] RBP: 0000556d5c0e32a0 R08: 0000000000000003 R09: 0000000000000000
            [18633.922569] R10: 0000000000000010 R11: 0000000000000246 R12: 0000556d5c111ce0
            [18633.922571] R13: 00007ffe573046d0 R14: 0000000000000004 R15: 0000556d5c0f4915
            [18633.922573]  </TASK>
            
            
            ys Yang Sheng added a comment - +1. https://testing.whamcloud.com/test_sets/5b314583-911e-4f74-abba-3590e5e66bd0 [18631.568718] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Wait for 1080568 [18631.791373] Lustre: DEBUG MARKER: Wait for 1080568 [18633.346663] LNet: There was an unexpected network error while writing to 10.240.25.244: rc = -22 [18633.596010] Lustre: DEBUG MARKER: /usr/sbin/lctl mark Finished wait on 1080568 [18633.871079] Lustre: DEBUG MARKER: Finished wait on 1080568 [18633.891459] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null && lctl dl | grep ' ST ' || true [18633.922010] LNetError: 1081165:0:(lib-md.c:259:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed: [18633.922014] LNetError: 1081165:0:(lib-md.c:259:lnet_assert_handler_unused()) LBUG [18633.922027] CPU: 0 PID: 1081165 Comm: lnetctl Kdump: loaded Tainted: G W OE ------- --- 5.14.0-362.24.1.el9_3.x86_64 #1 [18633.922030] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [18633.922034] Call Trace: [18633.922053] <TASK> [18633.922057] dump_stack_lvl+0x34/0x48 [18633.922111] lbug_with_loc.cold+0x5/0x58 [libcfs] [18633.922131] lnet_assert_handler_unused+0x9c/0xd0 [lnet] [18633.922195] ? __pfx_lnet_discovery_event_handler+0x10/0x10 [lnet] [18633.922231] LNetNIFini+0x9f/0x150 [lnet] [18633.922261] lnet_unconfigure+0x66/0x80 [lnet] [18633.922297] genl_family_rcv_msg_doit.isra.0+0xcb/0x120 [18633.922330] genl_family_rcv_msg+0x14c/0x220 [18633.922333] ? __pfx_lnet_net_conf_cmd+0x10/0x10 [lnet] [18633.922357] genl_rcv_msg+0x47/0xa0 [18633.922361] ? __pfx_genl_rcv_msg+0x10/0x10 [18633.922363] netlink_rcv_skb+0x57/0x100 [18633.922367] genl_rcv+0x24/0x40 [18633.922370] netlink_unicast+0x23e/0x360 [18633.922372] netlink_sendmsg+0x238/0x480 [18633.922374] ? __check_object_size.part.0+0x35/0xd0 [18633.922401] sock_sendmsg+0x62/0x70 [18633.922422] ____sys_sendmsg+0x230/0x270 [18633.922424] ? copy_msghdr_from_user+0x6d/0xa0 [18633.922427] ___sys_sendmsg+0x88/0xd0 [18633.922433] ? ___sys_recvmsg+0x88/0xd0 [18633.922436] __sys_sendmsg+0x59/0xa0 [18633.922438] do_syscall_64+0x5c/0x90 [18633.922463] ? syscall_exit_work+0x103/0x130 [18633.922481] ? syscall_exit_to_user_mode+0x22/0x40 [18633.922484] ? do_syscall_64+0x69/0x90 [18633.922485] ? do_syscall_64+0x69/0x90 [18633.922487] ? exc_page_fault+0x62/0x150 [18633.922489] entry_SYSCALL_64_after_hwframe+0x72/0xdc [18633.922505] RIP: 0033:0x7fc79b94f787 [18633.922558] Code: 0a 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 89 54 24 1c 48 89 74 24 10 [18633.922560] RSP: 002b:00007ffe57304698 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [18633.922566] RAX: ffffffffffffffda RBX: 0000556d5c0e3390 RCX: 00007fc79b94f787 [18633.922567] RDX: 0000000000000000 RSI: 00007ffe573046d0 RDI: 0000000000000003 [18633.922569] RBP: 0000556d5c0e32a0 R08: 0000000000000003 R09: 0000000000000000 [18633.922569] R10: 0000000000000010 R11: 0000000000000246 R12: 0000556d5c111ce0 [18633.922571] R13: 00007ffe573046d0 R14: 0000000000000004 R15: 0000556d5c0f4915 [18633.922573] </TASK>
            adilger Andreas Dilger added a comment - - edited This crashed again on the tip of master (patch based on v2_15_65-5-g018c4e8f25): https://testing.whamcloud.com/test_sets/964597eb-199c-47fc-9609-b23c1f47f949 and 30 other crashes of sanity-lnet test_212 since this patch landed: https://testing.whamcloud.com/search?horizon=2332800&status%5B%5D=CRASH&test_set_script_id=a2b1c4b2-b449-11e9-b88c-52540065bddc&sub_test_script_id=058541e7-0f00-4f82-a6e5-8316ea99a160&source=sub_tests#redirect
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55559/
            Subject: LU-14810 lnet: Do not issue multiple PUSHes
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 72726a311814bc0c0eefb22a769c9ebf7912839e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55559/ Subject: LU-14810 lnet: Do not issue multiple PUSHes Project: fs/lustre-release Branch: master Current Patch Set: Commit: 72726a311814bc0c0eefb22a769c9ebf7912839e

            "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55559
            Subject: LU-14810 lnet: Do not issue multiple PUSHes
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d33780a2496bcf10c179fff9bb25e7084b2baf0b

            gerrit Gerrit Updater added a comment - "Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55559 Subject: LU-14810 lnet: Do not issue multiple PUSHes Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d33780a2496bcf10c179fff9bb25e7084b2baf0b

            People

              cbordage Cyril Bordage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated: