Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17083

sanity-lnet test_205: Expected 2 resends found x

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for eaujames <eaujames@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/fd95d551-a9e8-4c26-8f57-98940f6e6532

      test_205 failed with the following error:

      Expected 2 resends found 0
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/97552 - 5.14.0-284.25.1.el9_2.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/97552 - 4.18.0-477.15.1.el8_lustre.x86_64

      Pre resends: 6
      Post resends: 8
      Resends delta: 2
      Pre local health: 3000
      Post local health: 2700
      Pre remote health: 2000
      Post remote health: 2000
      /usr/sbin/lnetctl peer set --health 1000 --all
      /usr/sbin/lnetctl net set --health 1000 --all
      Removed 8 drop rules
      Check that 2 resends took place
      Check that local NI health has been changed
      Simulate local_timeout
      Added drop rule 10.240.29.136@tcp->10.240.29.56@tcp (1/1)
      Added drop rule 10.240.29.136@tcp->10.240.29.56@tcp1 (1/1)
      Added drop rule 10.240.29.136@tcp->10.240.29.136@tcp (1/1)
      Added drop rule 10.240.29.136@tcp->10.240.29.136@tcp1 (1/1)
      Added drop rule 10.240.29.136@tcp1->10.240.29.56@tcp (1/1)
      Added drop rule 10.240.29.136@tcp1->10.240.29.56@tcp1 (1/1)
      Added drop rule 10.240.29.136@tcp1->10.240.29.136@tcp (1/1)
      Added drop rule 10.240.29.136@tcp1->10.240.29.136@tcp1 (1/1)
      /usr/sbin/lnetctl ping 10.240.29.56@tcp
      manage:
          - ping:
                errno: -1
                descr: failed to ping 10.240.29.56@tcp: Operation canceled
                       
      Pre resends: 8
      Post resends: 8
      Resends delta: 0
      Pre local health: 3000
      Post local health: 2800
      Pre remote health: 2000
      Post remote health: 2000
      /usr/sbin/lnetctl peer set --health 1000 --all
      /usr/sbin/lnetctl net set --health 1000 --all
      Removed 8 drop rules
      Check that 2 resends took place
       sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0 
      
      [13097.969039] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure --all
      [13097.973832] LNet: Added LNI 10.240.29.136@tcp [8/256/0/180]
      [13097.974649] LNet: Accept all, port 7988
      [13098.315173] Lustre: DEBUG MARKER: /usr/sbin/lnetctl discover 10.240.29.56@tcp
      [13103.133331] Lustre: DEBUG MARKER: /usr/sbin/lnetctl lnet configure
      [13103.139318] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net add --net tcp1 --if eth0
      [13103.142951] LNet: Added LNI 10.240.29.136@tcp1 [8/256/0/180]
      [13103.343360] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
      [13103.346783] LNet: There was an unexpected network error while writing to 10.240.29.56: rc = -22
      [13103.347984] LNet: 1 local NIs in recovery (showing 1): 10.240.29.136@tcp
      [13103.409422] LNet: 1003942:0:(api-ni.c:357:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated
      [13103.423442] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
      [13103.429448] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
      [13103.506732] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
      [13103.596613] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
      [13103.602640] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
      [13103.681700] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
      [13103.757408] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
      [13103.763403] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
      [13103.844551] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
      [13103.850275] LNet: There was an unexpected network error while writing to 10.240.29.56: rc = -22
      [13103.851360] LNet: Skipped 8 previous similar messages
      [13103.920628] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
      [13103.926682] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
      [13104.007477] Lustre: DEBUG MARKER: /usr/sbin/lnetctl ping 10.240.29.56@tcp
      [13106.921400] LNetError: 1003626:0:(lib-move.c:3441:lnet_resend_pending_msgs_locked()) Error sending GET to 12345-10.240.29.56@tcp1: -125
      [13106.974460] LNet: 1004214:0:(api-ni.c:357:recovery_interval_set()) 'lnet_recovery_interval' has been deprecated
      [13106.975673] LNet: 1004214:0:(api-ni.c:357:recovery_interval_set()) Skipped 3 previous similar messages
      [13106.988985] Lustre: DEBUG MARKER: /usr/sbin/lnetctl peer set --health 1000 --all
      [13106.994939] Lustre: DEBUG MARKER: /usr/sbin/lnetctl net set --health 1000 --all
      [13107.267145] Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0 
      [13107.478131] Lustre: DEBUG MARKER: sanity-lnet test_205: @@@@@@ FAIL: Expected 2 resends found 0
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-lnet test_205 - Expected 2 resends found 0

      Attachments

        Activity

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: