Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17101

sanity-lnet test_220: timeout - route goes down

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for eaujames <eaujames@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/a5dba607-7817-42ca-9075-4c7880f9082c

      test_220 failed with the following error:

      Timeout occurred after 369 minutes, last suite running was sanity-lnet
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/97846 - 4.18.0-425.10.1.el8_7.aarch64
      servers: https://build.whamcloud.com/job/lustre-reviews/97846 - 4.18.0-477.15.1.el8_lustre.x86_64

      route goes down during lnet_selftest:

      [Thu Sep  7 20:51:51 2023] Lustre: DEBUG MARKER: Start LST rw
      [Thu Sep  7 20:51:51 2023] LNet: 1042010:0:(rpc.c:641:srpc_service_add_buffers()) waiting for adding buffer
      [Thu Sep  7 20:51:51 2023] LNet: 943043:0:(rpc.c:641:srpc_service_add_buffers()) waiting for adding buffer
      [Thu Sep  7 20:52:05 2023] LNetError: 1068901:0:(lib-lnet.h:1305:lnet_set_route_aliveness()) route to tcp2 through 10.240.44.207@tcp1 has gone from up to down
      [Thu Sep  7 20:52:05 2023] LNetError: 1068901:0:(lib-lnet.h:1305:lnet_set_route_aliveness()) Skipped 1 previous similar message
      [Thu Sep  7 20:52:06 2023] LNetError: 943043:0:(lib-move.c:2341:lnet_handle_find_routed_path()) no route to 10.240.45.24@tcp2 from <?>
      [Thu Sep  7 20:52:06 2023] Lustre: DEBUG MARKER: lst stop brw_rw
      [Thu Sep  7 20:52:07 2023] Lustre: DEBUG MARKER: lst stop brw_rw
      [Thu Sep  7 20:52:07 2023] Lustre: DEBUG MARKER: Stop LST rw
      [Thu Sep  7 20:52:07 2023] LNetError: 1042010:0:(lib-move.c:2341:lnet_handle_find_routed_path()) no route to 10.240.45.24@tcp2 from 10.240.44.206@tcp1
      [Thu Sep  7 20:52:07 2023] LNetError: 1042010:0:(lib-move.c:2341:lnet_handle_find_routed_path()) Skipped 1 previous similar message
      [Thu Sep  7 20:52:07 2023] LustreError: 1042010:0:(brw_test.c:388:brw_server_rpc_done()) Bulk transfer from 12345-10.240.45.24@tcp2 has failed: -113
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-lnet test_220 - Timeout occurred after 369 minutes, last suite running was sanity-lnet

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: