Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11472

LNet Health: Decrement health value on response timeout

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.12.0
    • Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      When a response times out we want to decrement the health of the immediate next hop peer ni, so we don't use that interface if there are others available.

      Attachments

        Activity

          [LU-11472] LNet Health: Decrement health value on response timeout
          pjones Peter Jones added a comment -

          Landed for 2.12

          pjones Peter Jones added a comment - Landed for 2.12

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33308/
          Subject: LU-11472 lnet: Decrement health on timeout
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 139d69141b73d427490f39d3096b2187e979eaea

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33308/ Subject: LU-11472 lnet: Decrement health on timeout Project: fs/lustre-release Branch: master Current Patch Set: Commit: 139d69141b73d427490f39d3096b2187e979eaea

          Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33308
          Subject: LU-11472 lnet: Decrement health on timeout
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 56b5ef7e5c7a1d0c7aca23504acb2b2bc5862199

          gerrit Gerrit Updater added a comment - Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33308 Subject: LU-11472 lnet: Decrement health on timeout Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 56b5ef7e5c7a1d0c7aca23504acb2b2bc5862199

          Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33295
          Subject: LU-11472 lnet: Decrement health on timeout
          Project: fs/lustre-release
          Branch: multi-rail
          Current Patch Set: 1
          Commit: 9df50755373be42b64f640e664f1e05690f37531

          gerrit Gerrit Updater added a comment - Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33295 Subject: LU-11472 lnet: Decrement health on timeout Project: fs/lustre-release Branch: multi-rail Current Patch Set: 1 Commit: 9df50755373be42b64f640e664f1e05690f37531

          I have a patch which I'll commit shortly. However, although this is going to work for directly connected. The behavior might be an issue for routing. If the route is servicing multiple connected peer through the same route. If one of the final destinations has a problem and it doesn't respond then the router interface will be dinged. It'll be put on the recovery queue, and recover, but during that period of time the route will be down.

          ashehata Amir Shehata (Inactive) added a comment - I have a patch which I'll commit shortly. However, although this is going to work for directly connected. The behavior might be an issue for routing. If the route is servicing multiple connected peer through the same route. If one of the final destinations has a problem and it doesn't respond then the router interface will be dinged. It'll be put on the recovery queue, and recover, but during that period of time the route will be down.

          People

            ashehata Amir Shehata (Inactive)
            ashehata Amir Shehata (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: