Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17313

sanity-lnet test_205/test_220: LNetError: (lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler )

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.16.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Arshad <arshad.hussain@aeoncomputing.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/1432600f-ba6e-45a0-a998-c2b871760b9b

      test_205 failed with the following error:

      trevis-33vm2 crashed during sanity-lnet test_205
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/100460 - 4.18.0-477.27.1.el8_8.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/100460 - 4.18.0-477.27.1.el8_lustre.x86_64

      Client Output:

       

      =================================== 16:17:04 \(1700756224\)^M
      [21692.122539] Lustre: DEBUG MARKER: == sanity-lnet test 204: Check no health or resends for single-rail local failures ========================================================== 16:17:04 (1700756224)^M
      [21692.403638] LNet: 1467339:0:(lib-ptl.c:956:lnet_clear_lazy_portal()) Active lazy portal 0 on exit^M
      [21692.405821] LNet: Removed LNI 10.1.2.3@tcp^M
      [21692.413980] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&^M
      [21692.413980] lctl dl | grep ' ST ' || true^M
      [21692.511140] Key type .llcrypt unregistered^M
      [21692.512020] Key type ._llcrypt unregistered^M
      [21692.800163] Key type ._llcrypt registered^M
      [21692.801024] Key type .llcrypt registered^M
      [21692.832028] libcfs: HW NUMA nodes: 1, HW CPU cores: 2, npartitions: 2^M
      [21692.836702] alg: No test for adler32 (adler32-zlib)^M
      [21693.611267] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure --all^M
      [21693.617623] LNet: Added LNI 10.240.39.5@tcp [8/256/0/180]^M
      [21693.618922] LNet: Accept all, port 7988^M
      [21694.553419] Lustre: DEBUG MARKER: /usr/sbin/lnetctl discover 10.240.38.123@tcp^M
      [21694.846378] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21694.850290] LNet: There was an unexpected network error while writing to 10.240.38.123: rc = -22^M
      [21694.896561] LNet: 1467564:0:(api-ni.c:358:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated^M
      [21694.911244] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21694.917894] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21694.971823] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21695.052546] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21695.059301] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21695.111848] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21695.199325] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21695.206117] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21695.258286] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21695.382743] LNet: There was an unexpected network error while writing to 10.240.38.123: rc = -22^M
      [21695.384355] LNet: Skipped 2 previous similar messages^M
      [21695.428743] LNet: 1467750:0:(api-ni.c:358:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated^M
      [21695.430652] LNet: 1467750:0:(api-ni.c:358:recovery_interval_set()) Skipped 2 previous similar messages^M
      [21695.445043] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21695.451861] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21695.506046] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21695.607926] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21695.614630] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21695.667799] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21695.808409] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21695.815303] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21695.828815] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&^M
      [21695.828815] lctl dl | grep ' ST ' || true^M
      [21695.860860] LNet: 1467904:0:(lib-ptl.c:956:lnet_clear_lazy_portal()) Active lazy portal 0 on exit^M
      [21695.924073] LNetError: 1467470:0:(socklnd.c:1261:ksocknal_create_conn()) Not creating conn 12345-10.240.38.123@tcp type 2: peer_ni/conn_cb removed^M
      [21695.926432] LNetError: 11e-e: Unexpected error -116 connecting to 10.240.38.123@tcp at host 10.240.38.123:7988^M
      [21696.874054] LNet: Removed LNI 10.240.39.5@tcp^M
      [21696.938956] Key type .llcrypt unregistered^M
      [21696.939795] Key type ._llcrypt unregistered^M
      [21705.904334] Key type ._llcrypt registered^M
      [21705.905266] Key type .llcrypt registered^M
      [21706.693491] Key type .llcrypt unregistered^M
      [21706.694372] Key type ._llcrypt unregistered^M
      [21706.974304] Key type ._llcrypt registered^M
      [21706.978361] Key type .llcrypt registered^M
      [21707.011233] libcfs: HW NUMA nodes: 1, HW CPU cores: 2, npartitions: 2^M
      [21707.015533] alg: No test for adler32 (adler32-zlib)^M
      [21707.790659] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure --all^M
      [21707.797247] LNet: Added LNI 10.240.39.5@tcp [8/256/0/180]^M
      [21707.798822] LNet: Accept all, port 7988^M
      [21708.368775] Lustre: DEBUG MARKER: /usr/sbin/lnetctl discover 10.240.38.123@tcp^M
      [21715.241659] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure^M
      [21715.248115] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net add --net tcp1 --if eth0^M
      [21715.252231] LNet: Added LNI 10.240.39.5@tcp1 [8/256/0/180]^M
      [21715.441585] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21715.445581] LNet: There was an unexpected network error while writing to 10.240.38.123: rc = -22^M 
      [21715.447305] LNet: 1 local NIs in recovery (showing 1): 10.240.39.5@tcp^M
      [21715.637913] LNet: 1469390:0:(api-ni.c:358:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated^M
      [21715.652679] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21715.659414] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21715.729247] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21715.949323] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21715.956120] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21716.026111] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21716.029989] LNet: There was an unexpected network error while writing to 10.240.38.123: rc = -22^M 
      [21716.031631] LNet: Skipped 5 previous similar messages^M
      [21716.231601] LNet: 1469526:0:(api-ni.c:358:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated^M
      [21716.233446] LNet: 1469526:0:(api-ni.c:358:recovery_interval_set()) Skipped 1 previous similar message^M
      [21716.248327] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21716.255340] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21716.326242] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21716.628909] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21716.636214] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21716.707624] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21717.005150] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      21717.011880] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21717.079281] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.38.123@tcp^M
      [21717.083320] LNet: There was an unexpected network error while writing to 10.240.38.123: rc = -22^M 
      [21717.084987] LNet: Skipped 9 previous similar messages^M
      [21717.142283] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all^M
      [21717.149327] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all^M
      [21720.087147] Lustre: DEBUG MARKER: lsmod | grep lnet > /dev/null &&^M
      [21720.087147] lctl dl | grep ' ST ' || true^M
      [21720.119727] LNetError: 1469074:0:(lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed: ^M
      [21720.121839] LNetError: 1469074:0:(lib-md.c:281:lnet_assert_handler_unused()) LBUG^M
      [21720.123252] Pid: 1469074, comm: lnet_discovery 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Thu Aug 31 10:29:22 EDT 2023^M
      [21720.125123] Call Trace TBD:^M
      [21720.125763] [<0>] libcfs_call_trace+0x63/0x90 [libcfs]^M
      [21720.126772] [<0>] lbug_with_loc+0x3f/0x70 [libcfs]^M
      [21720.127754] [<0>] lnet_assert_handler_unused+0xa0/0xd0 [lnet]^M
      [21720.128903] [<0>] lnet_peer_discovery+0x14fb/0x1730 [lnet]^M
      [21720.129985] [<0>] kthread+0x134/0x150^M
      [21720.130761] [<0>] ret_from_fork+0x35/0x40^M
      [21720.131608] Kernel panic - not syncing: LBUG^M
      [21720.132444] CPU: 0 PID: 1469074 Comm: lnet_discovery Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-477.27.1.el8_8.x86_64 #1^M
      [21720.134714] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011^M
      [21720.134720] LNetError: 1469071:0:(lib-move.c:4933:lnet_parse()) 10.240.38.123@tcp, src 10.240.38.123@tcp: Dropping ACK (error -108 looking up sender)^M
      [21720.135832] Call Trace:^M
      [21720.135865]  dump_stack+0x41/0x60^M
      [21720.139358]  panic+0xe7/0x2ac^M
      [21720.139973]  ? ret_from_fork+0x35/0x40^M
      [21720.140690]  ? lnet_discovery_event_reply+0xb00/0xb00 [lnet]^M
      [21720.141763]  lbug_with_loc.cold.8+0x18/0x18 [libcfs]^M
      [21720.142713]  lnet_assert_handler_unused+0xa0/0xd0 [lnet]^M
      [21720.143717]  lnet_peer_discovery+0x14fb/0x1730 [lnet]^M
      [21720.144692]  ? finish_wait+0x80/0x80^M
      [21720.145393]  ? lnet_peer_merge_data+0x1110/0x1110 [lnet]^M
      [21720.146400]  kthread+0x134/0x150^M
      [21720.147027]  ? set_kthread_struct+0x50/0x50^M
      [21720.147820]  ret_from_fork+0x35/0x40^M
      [21720.087147] lctl dl | grep ' ST ' || true^M
      [21720.119727] LNetError: 1469074:0:(lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed: ^M
      [21720.121839] LNetError: 1469074:0:(lib-md.c:281:lnet_assert_handler_unused()) LBUG^M
      [21720.123252] Pid: 1469074, comm: lnet_discovery 4.18.0-477.27.1.el8_8.x86_64 #1 SMP Thu Aug 31 10:29:22 EDT 2023^M
      [21720.125123] Call Trace TBD:^M
      [21720.125763] [<0>] libcfs_call_trace+0x63/0x90 [libcfs]^M
      [21720.126772] [<0>] lbug_with_loc+0x3f/0x70 [libcfs]^M
      [21720.127754] [<0>] lnet_assert_handler_unused+0xa0/0xd0 [lnet]^M
      [21720.128903] [<0>] lnet_peer_discovery+0x14fb/0x1730 [lnet]^M
      [21720.129985] [<0>] kthread+0x134/0x150^M
      [21720.130761] [<0>] ret_from_fork+0x35/0x40^M
      [21720.131608] Kernel panic - not syncing: LBUG^M
      [21720.132444] CPU: 0 PID: 1469074 Comm: lnet_discovery Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-477.27.1.el8_8.x86_64 #1^M
      [21720.134714] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011^M
      [21720.134720] LNetError: 1469071:0:(lib-move.c:4933:lnet_parse()) 10.240.38.123@tcp, src 10.240.38.123@tcp: Dropping ACK (error -108 looking up sender)^M
      [21720.135832] Call Trace:^M
      [21720.135865]  dump_stack+0x41/0x60^M
      [21720.139358]  panic+0xe7/0x2ac^M
      [21720.139973]  ? ret_from_fork+0x35/0x40^M
      [21720.140690]  ? lnet_discovery_event_reply+0xb00/0xb00 [lnet]^M
      [21720.141763]  lbug_with_loc.cold.8+0x18/0x18 [libcfs]^M
      [21720.142713]  lnet_assert_handler_unused+0xa0/0xd0 [lnet]^M
      [21720.143717]  lnet_peer_discovery+0x14fb/0x1730 [lnet]^M
      [21720.144692]  ? finish_wait+0x80/0x80^M
      [21720.145393]  ? lnet_peer_merge_data+0x1110/0x1110 [lnet]^M
      

       

      Sanity-lnet 204/205 run output

       

      == sanity-lnet test 204: Check no health or resends for single-rail local failures ========================================================== 16:17:04 (1700756224)
      Cleaning up LNet
      CMD: trevis-33vm2.trevis.whamcloud.com lsmod | grep lnet > /dev/null &&
      lctl dl | grep ' ST ' || true
      Writer error: failed to resolve Netlink family id
      modules unloaded.
      Loading modules from /usr/lib64/lustre
      detected 2 online CPUs by sysfs
      Force libcfs to create 2 CPU partitions
      ../libcfs/libcfs/libcfs options: 'cpu_npartitions=2'
      ../lnet/lnet/lnet options: 'accept=all'
      /usr/sbin/lnetctl lnet configure --all
      CMD: trevis-26vm4 /usr/sbin/lctl list_nids
      /usr/sbin/lnetctl discover 10.240.38.123@tcp
      discover:
      - primary nid: 10.240.38.123@tcp
        Multi-Rail: true
        peer_ni:
        - nid: 10.240.38.123@tcp
      net:
      -     net type: lo
            local NI(s):
            -     nid: 0@lo
                  status: up
      -     net type: tcp
            local NI(s):
            -     nid: 10.240.39.5@tcp
                  status: up
                  interfaces:
                        0: eth0
      -     primary nid: 10.240.38.123@tcp
            -     nid: 10.240.38.123@tcp
                  health stats:
                        health value: 1000
      debug=+net
      Simulate local_interrupt
      Added drop rule 10.240.39.5@tcp->10.240.38.123@tcp (1/1)
      Added drop rule 10.240.39.5@tcp->10.240.39.5@tcp (1/1)
      /usr/sbin/lnetctl ping 10.240.38.123@tcp
      manage:
          - ping:
                errno: -1
                descr: failed to ping 10.240.38.123@tcp: Input/output error
                       
      Pre resends: 0
      Post resends: 0
      Resends delta: 0
      Pre local health: 1000
      Post local health: 1000
      Pre remote health: 1000
      Post remote health: 1000
      /usr/sbin/lnetctl peer set --health 1000 --all
      /usr/sbin/lnetctl net set --health 1000 --all
      Removed 2 drop rules
      Check that no resends took place
      Check that local NI health is unchanged
      Simulate local_dropped
      Added drop rule 10.240.39.5@tcp->10.240.38.123@tcp (1/1)
      Added drop rule 10.240.39.5@tcp->10.240.39.5@tcp (1/1)
      /usr/sbin/lnetctl ping 10.240.38.123@tcp
      manage:
          - ping:
                errno: -1
                descr: failed to ping 10.240.38.123@tcp: Input/output error
                       
      Pre resends: 0
      Post resends: 0
      Resends delta: 0
      Pre local health: 2000
      Post local health: 2000
      Pre remote health: 1000
      Post remote health: 1000
      /usr/sbin/lnetctl peer set --health 1000 --all
      /usr/sbin/lnetctl net set --health 1000 --all
      Removed 2 drop rules
      Check that no resends took place
      Check that local NI health is unchanged
      Simulate local_aborted
      Added drop rule 10.240.39.5@tcp->10.240.38.123@tcp (1/1)
      Added drop rule 10.240.39.5@tcp->10.240.39.5@tcp (1/1)
      /usr/sbin/lnetctl ping 10.240.38.123@tcp
      manage:
          - ping:
                errno: -1
                descr: failed to ping 10.240.38.123@tcp: Input/output error
                       
      Pre resends: 0
      Post resends: 0
      Resends delta: 0
      Pre local health: 2000
      Post local health: 2000
      Pre remote health: 1000
      Post remote health: 1000
      /usr/sbin/lnetctl peer set --health 1000 --all
      /usr/sbin/lnetctl net set --health 1000 --all
      Removed 2 drop rules
      Check that no resends took place
      Check that local NI health is unchanged
      Simulate local_no_route
      Added drop rule 10.240.39.5@tcp->10.240.38.123@tcp (1/1)
      Added drop rule 10.240.39.5@tcp->10.240.39.5@tcp (1/1)
      /usr/sbin/lnetctl ping 10.240.38.123@tcp
      manage:
          - ping:
                errno: -1
                descr: failed to ping 10.240.38.123@tcp: Input/output error
                       
      Pre resends: 0
      Post resends: 0
      Resends delta: 0
      Pre local health: 2000
      Post local health: 2000
      Pre remote health: 1000
      Post remote health: 1000
      /usr/sbin/lnetctl peer set --health 1000 --all
      /usr/sbin/lnetctl net set --health 1000 --all
      Removed 2 drop rules
      Check that no resends took place
      Check that local NI health is unchanged
      Simulate local_timeout
      Added drop rule 10.240.39.5@tcp->10.240.38.123@tcp (1/1)
      Added drop rule 10.240.39.5@tcp->10.240.39.5@tcp (1/1)
      /usr/sbin/lnetctl ping 10.240.38.123@tcp
      manage:
          - ping:
                errno: -1
                descr: failed to ping 10.240.38.123@tcp: Input/output error
                       
      Pre resends: 0
      Post resends: 0
      Resends delta: 0
      Pre local health: 2000
      Post local health: 2000
      Pre remote health: 1000
      Post remote health: 1000
      /usr/sbin/lnetctl peer set --health 1000 --all
      /usr/sbin/lnetctl net set --health 1000 --all
      Removed 2 drop rules
      Check that no resends took place
      Check that local NI health is unchanged
      Simulate local_error
      Added drop rule 10.240.39.5@tcp->10.240.38.123@tcp (1/1)
      Added drop rule 10.240.39.5@tcp->10.240.39.5@tcp (1/1)
      /usr/sbin/lnetctl ping 10.240.38.123@tcp
      manage:
          - ping:
                errno: -1
                descr: failed to ping 10.240.38.123@tcp: Input/output error
                       
      Pre resends: 0
      Post resends: 0
      Resends delta: 0
      Pre local health: 2000
      Post local health: 2000
      Pre remote health: 1000
      Post remote health: 1000
      /usr/sbin/lnetctl peer set --health 1000 --all
      /usr/sbin/lnetctl net set --health 1000 --all
      Removed 2 drop rules
      Check that no resends took place
      Check that local NI health is unchanged
      CMD: trevis-33vm2.trevis.whamcloud.com lsmod | grep lnet > /dev/null &&
      lctl dl | grep ' ST ' || true
      Writer error: failed to resolve Netlink family id
      modules unloaded.
      pdsh@trevis-33vm2: trevis-33vm2: ssh exited with exit code 2
      pdsh@trevis-33vm2: trevis-33vm2: ssh exited with exit code 2
      PASS 204 (10s)
      CMD: trevis-26vm4 sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-33vm2.trevis.whamcloud.com sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-33vm3 sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-67vm7 sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-79vm7 sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-26vm4 sysctl --values kernel/kptr_restrict
      CMD: trevis-26vm4 sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-33vm2.trevis.whamcloud.com sysctl --values kernel/kptr_restrict
      CMD: trevis-33vm2.trevis.whamcloud.com sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-33vm3 sysctl --values kernel/kptr_restrict
      CMD: trevis-33vm3 sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-67vm7 sysctl --values kernel/kptr_restrict
      CMD: trevis-67vm7 sysctl -wq kernel/kptr_restrict=1
      CMD: trevis-79vm7 sysctl --values kernel/kptr_restrict
      CMD: trevis-79vm7 sysctl -wq kernel/kptr_restrict=1
      == sanity-lnet test 205: Check health and resends for multi-rail local failures ========================================================== 16:17:18 (1700756238)
      Cleaning up LNet
      CMD: trevis-33vm2.trevis.whamcloud.com lsmod | grep lnet > /dev/null &&
      lctl dl | grep ' ST ' || true
      

       

      Nothing alarming/useful under MDS/OST output

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-lnet test_205 - trevis-33vm2 crashed during sanity-lnet test_205

      Attachments

        Issue Links

          Activity

            [LU-17313] sanity-lnet test_205/test_220: LNetError: (lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler )
            aboyko Alexander Boyko added a comment - +1 on master + patch https://testing.whamcloud.com/test_sets/2564a247-8354-4689-a585-51df08cf51d7
            bzzz Alex Zhuravlev added a comment - +1 on master: https://testing.whamcloud.com/test_sets/2c86f0ad-ea95-4ed3-bb05-f84497d6b435
            qian_wc Qian Yingjin added a comment - +1 on master: https://testing.whamcloud.com/test_sessions/8813bec6-5541-46c2-953e-42e087ee73d5

            This looks like a duplicate of LU-14810, which is already assigned to Cyril.

            adilger Andreas Dilger added a comment - This looks like a duplicate of LU-14810 , which is already assigned to Cyril.
            adilger Andreas Dilger added a comment - +1 on master test_205: https://testing.whamcloud.com/test_sets/3cf7e1b8-0fa3-43e0-a2af-06a1c4a538eb
            arshad512 Arshad Hussain added a comment - - edited

            +1 on Master(for aarch64)  https://testing.whamcloud.com/test_sessions/ab07dba8-8b72-4468-9c0b-65182aafc48a

            On MDS

            [25868.167715] Lustre: DEBUG MARKER: == sanity-lnet test 220: Add routes w/default options - check aliveness ========================================================== 21:21:22 (1707168082)
            [25868.709079] Lustre: DEBUG MARKER: /usr/sbin/lustre_rmmod
            [25869.289481] Key type lgssc unregistered
            [25869.351068] LNetError: 11254:0:(lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed:@
            [25869.352915] LNetError: 11254:0:(lib-md.c:281:lnet_assert_handler_unused()) LBUG 
            [25869.354122] CPU: 0 PID: 11254 Comm: lnet_discovery Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-477.27.1.el8_lustre.x86_64 #1
            [25869.356169] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [25869.357115] Call Trace:
            [25869.357602]  ? lnet_discovery_event_reply+0xb00/0xb00 [lnet]
            [25869.358667]  dump_stack+0x41/0x60
            [25869.359294]  lbug_with_loc.cold.8+0x5/0x43 [libcfs]
            [25869.360176]  lnet_assert_handler_unused+0xa0/0xd0 [lnet]
            [25869.361101]  lnet_peer_discovery+0x13e6/0x1620 [lnet]
            [25869.361978]  ? finish_wait+0x80/0x80
            [25869.362617]  ? lnet_peer_merge_data+0x1110/0x1110 [lnet]
            [25869.363522]  kthread+0x134/0x150
            [25869.364110]  ? set_kthread_struct+0x50/0x50
            [25869.364827]  ret_from_fork+0x35/0x40
            [25869.365473] Kernel panic - not syncing: LBUG 
            [25869.366204] CPU: 0 PID: 11254 Comm: lnet_discovery Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-477.27.1.el8_lustre.x86_64 #1
            [25869.368229] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [25869.369173] Call Trace:
            [25869.369619]  dump_stack+0x41/0x60
            [25869.370194]  panic+0xe7/0x2ac
            [25869.370741]  ? __switch_to_asm+0x51/0x80
            [25869.371412]  ? __switch_to_asm+0x51/0x80
            [25869.372087]  ? lnet_discovery_event_reply+0xb00/0xb00 [lnet]
            [25869.373044]  lbug_with_loc.cold.8+0x1a/0x43 [libcfs]
            [25869.373882]  lnet_assert_handler_unused+0xa0/0xd0 [lnet]
            [25869.374787]  lnet_peer_discovery+0x13e6/0x1620 [lnet]
            [25869.375658]  ? finish_wait+0x80/0x80
            [25869.376281]  ? lnet_peer_merge_data+0x1110/0x1110 [lnet]
            [25869.377185]  kthread+0x134/0x150
            [25869.377750]  ? set_kthread_struct+0x50/0x50
            [25869.378457]  ret_from_fork+0x35/0x40
            arshad512 Arshad Hussain added a comment - - edited +1 on Master(for aarch64)  https://testing.whamcloud.com/test_sessions/ab07dba8-8b72-4468-9c0b-65182aafc48a On MDS [25868.167715] Lustre: DEBUG MARKER: == sanity-lnet test 220: Add routes w/ default options - check aliveness ========================================================== 21:21:22 (1707168082) [25868.709079] Lustre: DEBUG MARKER: /usr/sbin/lustre_rmmod [25869.289481] Key type lgssc unregistered [25869.351068] LNetError: 11254:0:(lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed:@ [25869.352915] LNetError: 11254:0:(lib-md.c:281:lnet_assert_handler_unused()) LBUG  [25869.354122] CPU: 0 PID: 11254 Comm: lnet_discovery Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-477.27.1.el8_lustre.x86_64 #1 [25869.356169] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [25869.357115] Call Trace: [25869.357602]  ? lnet_discovery_event_reply+0xb00/0xb00 [lnet] [25869.358667]  dump_stack+0x41/0x60 [25869.359294]  lbug_with_loc.cold.8+0x5/0x43 [libcfs] [25869.360176]  lnet_assert_handler_unused+0xa0/0xd0 [lnet] [25869.361101]  lnet_peer_discovery+0x13e6/0x1620 [lnet] [25869.361978]  ? finish_wait+0x80/0x80 [25869.362617]  ? lnet_peer_merge_data+0x1110/0x1110 [lnet] [25869.363522]  kthread+0x134/0x150 [25869.364110]  ? set_kthread_struct+0x50/0x50 [25869.364827]  ret_from_fork+0x35/0x40 [25869.365473] Kernel panic - not syncing: LBUG  [25869.366204] CPU: 0 PID: 11254 Comm: lnet_discovery Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-477.27.1.el8_lustre.x86_64 #1 [25869.368229] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [25869.369173] Call Trace: [25869.369619]  dump_stack+0x41/0x60 [25869.370194]  panic+0xe7/0x2ac [25869.370741]  ? __switch_to_asm+0x51/0x80 [25869.371412]  ? __switch_to_asm+0x51/0x80 [25869.372087]  ? lnet_discovery_event_reply+0xb00/0xb00 [lnet] [25869.373044]  lbug_with_loc.cold.8+0x1a/0x43 [libcfs] [25869.373882]  lnet_assert_handler_unused+0xa0/0xd0 [lnet] [25869.374787]  lnet_peer_discovery+0x13e6/0x1620 [lnet] [25869.375658]  ? finish_wait+0x80/0x80 [25869.376281]  ? lnet_peer_merge_data+0x1110/0x1110 [lnet] [25869.377185]  kthread+0x134/0x150 [25869.377750]  ? set_kthread_struct+0x50/0x50 [25869.378457]  ret_from_fork+0x35/0x40
            adilger Andreas Dilger added a comment - +9 crashes of sanity-LNet test_220 this week: https://testing.whamcloud.com/search?status%5B%5D=CRASH&test_set_script_id=a2b1c4b2-b449-11e9-b88c-52540065bddc&sub_test_script_id=03c13a24-99ad-4246-9749-8279e6e898a7&start_date=2024-01-26&end_date=2024-02-01&source=sub_tests#redirect
            arshad512 Arshad Hussain added a comment - - edited

            +1 on (master) https://testing.whamcloud.com/test_sets/3f6810cc-7c8e-4ee4-9397-e09bad232d54

            Client:
            Lustre Version:2.15.60.13
            Kernel version:5.14.0-284.30.1.el9_2.x86_64

            Server:
            Lustre Version:2.15.60.13
            Kernel version:4.18.0-477.27.1.el8_lustre.x86_64

            This time it is seen under sanity-lnet/test_220 (panic under client)

            [21725.920801] Lustre: DEBUG MARKER: == sanity-lnet test 220: Add routes w/default options - check aliveness ========================================================== 16:47:56 (1705337276)
            [21726.373169] Lustre: DEBUG MARKER: /usr/sbin/lustre_rmmod
            [21726.999708] Key type lgssc unregistered
            [21727.051580] LNetError: 6094:0:(lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed: 
            [21727.053408] LNetError: 6094:0:(lib-md.c:281:lnet_assert_handler_unused()) LBUG
            [21727.054422] Pid: 6094, comm: lnet_discovery 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023
            [21727.055793] Call Trace TBD:
            [21727.056218] Kernel panic - not syncing: LBUG
            [21727.056813] CPU: 1 PID: 6094 Comm: lnet_discovery Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1
            [21727.058258] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
            [21727.059002] Call Trace:
            [21727.059418]  <TASK>
            [21727.059756]  dump_stack_lvl+0x34/0x48
            [21727.060365]  panic+0xf4/0x2c6
            [21727.060810]  ? lnet_discovery_event_reply+0xbc0/0xbc0 [lnet]
            [21727.062285]  lbug_with_loc.cold+0x18/0x18 [libcfs]
            [21727.063335]  lnet_assert_handler_unused+0x9c/0xd0 [lnet]
            [21727.064169]  lnet_peer_discovery+0x997/0xaf0 [lnet]
            [21727.064850]  ? cpuacct_percpu_seq_show+0x10/0x10
            [21727.065492]  ? lnet_peer_data_present+0x580/0x580 [lnet]
            [21727.066187]  kthread+0xd9/0x100
            [21727.066651]  ? kthread_complete_and_exit+0x20/0x20
            [21727.067276]  ret_from_fork+0x22/0x30
            [21727.067792]  </TASK>
            arshad512 Arshad Hussain added a comment - - edited +1 on (master) https://testing.whamcloud.com/test_sets/3f6810cc-7c8e-4ee4-9397-e09bad232d54 Client: Lustre Version:2.15.60.13 Kernel version:5.14.0-284.30.1.el9_2.x86_64 Server: Lustre Version:2.15.60.13 Kernel version:4.18.0-477.27.1.el8_lustre.x86_64 This time it is seen under sanity-lnet/test_220 (panic under client) [21725.920801] Lustre: DEBUG MARKER: == sanity-lnet test 220: Add routes w/ default options - check aliveness ========================================================== 16:47:56 (1705337276) [21726.373169] Lustre: DEBUG MARKER: /usr/sbin/lustre_rmmod [21726.999708] Key type lgssc unregistered [21727.051580] LNetError: 6094:0:(lib-md.c:281:lnet_assert_handler_unused()) ASSERTION( md->md_handler != handler ) failed:  [21727.053408] LNetError: 6094:0:(lib-md.c:281:lnet_assert_handler_unused()) LBUG [21727.054422] Pid: 6094, comm: lnet_discovery 5.14.0-284.30.1.el9_2.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Aug 25 09:13:12 EDT 2023 [21727.055793] Call Trace TBD: [21727.056218] Kernel panic - not syncing: LBUG [21727.056813] CPU: 1 PID: 6094 Comm: lnet_discovery Kdump: loaded Tainted: G           OE    --------  ---  5.14.0-284.30.1.el9_2.x86_64 #1 [21727.058258] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [21727.059002] Call Trace: [21727.059418]  <TASK> [21727.059756]  dump_stack_lvl+0x34/0x48 [21727.060365]  panic+0xf4/0x2c6 [21727.060810]  ? lnet_discovery_event_reply+0xbc0/0xbc0 [lnet] [21727.062285]  lbug_with_loc.cold+0x18/0x18 [libcfs] [21727.063335]  lnet_assert_handler_unused+0x9c/0xd0 [lnet] [21727.064169]  lnet_peer_discovery+0x997/0xaf0 [lnet] [21727.064850]  ? cpuacct_percpu_seq_show+0x10/0x10 [21727.065492]  ? lnet_peer_data_present+0x580/0x580 [lnet] [21727.066187]  kthread+0xd9/0x100 [21727.066651]  ? kthread_complete_and_exit+0x20/0x20 [21727.067276]  ret_from_fork+0x22/0x30 [21727.067792]  </TASK>
            adilger Andreas Dilger added a comment - +1 on master: https://testing.whamcloud.com/sub_tests/121f699a-8b7c-40dc-be4e-99f534b874b4

            People

              cbordage Cyril Bordage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: