Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11476

Account for -ECONNRESET in ksocknak_txlist_done()

Details

    • 3
    • 9223372036854775807

    Description

      In ksocknal_txlist_done(), ECONNRESET error is not accounted for. This should be added for remote failure cases.

       

       »·······»·······if (tx->tx_hstatus == LNET_MSG_STATUS_OK) {                                                                                                                      
       440 »·······»·······»·······if (error == -ETIMEDOUT)                                                                                                                                 
       441 »·······»·······»·······»·······tx->tx_hstatus =                                                                                                                                 
       442 »·······»·······»·······»·······  LNET_MSG_STATUS_LOCAL_TIMEOUT;                                                                                                                 
       443 »·······»·······»·······else if (error == -ENETDOWN ||                                                                                                                           
       444 »·······»·······»·······»······· error == -EHOSTUNREACH ||                                                                                                                       
       445 »·······»·······»·······»······· error == -ENETUNREACH)                                                                                                                          
       446 »·······»·······»·······»·······tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_DROPPED;                                                                                                  
       447 »·······»·······»·······/*                                                                                                                                                       
       448 »·······»·······»······· * for all other errors we don't want to                                                                                                                 
       449 »·······»·······»······· * retransmit                                                                                                                                            
       450 »·······»·······»······· */                                                                                                                                                      
       451 »·······»·······»·······else if (error)                                                                                                                                          
       452 »·······»·······»·······»·······tx->tx_hstatus = LNET_MSG_STATUS_LOCAL_ERROR;                                                                                                    
       453 »·······»·······}
      

       
      Due to this, when an interface is brought down on a node which is added as an MR peer on another node, then lnetctl ping to the down interface fails. Ideally with health feature, the down interface should not be used and message should go to the other interface which is still up.

      Accounting for ECONNRESET and updating the tx health status to LNET_MSG_STATUS_REMOTE_DROPPED corrects this behaviour.

      Attachments

        Activity

          [LU-11476] Account for -ECONNRESET in ksocknak_txlist_done()
          pjones Peter Jones added a comment -

          Landed for 2.12

          pjones Peter Jones added a comment - Landed for 2.12

          Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33307/
          Subject: LU-11476 lnet: set the health status correctly
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 5d77f0d8dc74c752032e449687090ff1360cd32e

          gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33307/ Subject: LU-11476 lnet: set the health status correctly Project: fs/lustre-release Branch: master Current Patch Set: Commit: 5d77f0d8dc74c752032e449687090ff1360cd32e

          Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33307
          Subject: LU-11476 lnet: set the health status correctly
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: ada66907566fd8e595720e3c7b721886dc84833d

          gerrit Gerrit Updater added a comment - Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33307 Subject: LU-11476 lnet: set the health status correctly Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ada66907566fd8e595720e3c7b721886dc84833d

          Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33294
          Subject: LU-11476 lnet: set the health status correctly
          Project: fs/lustre-release
          Branch: multi-rail
          Current Patch Set: 1
          Commit: 77bca66ffe3e4f30ee39d32b6b8c2c129aa6a550

          gerrit Gerrit Updater added a comment - Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33294 Subject: LU-11476 lnet: set the health status correctly Project: fs/lustre-release Branch: multi-rail Current Patch Set: 1 Commit: 77bca66ffe3e4f30ee39d32b6b8c2c129aa6a550
          gerrit Gerrit Updater added a comment - - edited

          Sonia Sharma (sharmaso@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33289
          Subject: LU-11476 lnd: Update health status for ECONNRESET
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 59b78dd6e89077f6f73751ffb45ea2e5f0ca35af

           

          Patch abandoned.

          gerrit Gerrit Updater added a comment - - edited Sonia Sharma (sharmaso@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33289 Subject: LU-11476 lnd: Update health status for ECONNRESET Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 59b78dd6e89077f6f73751ffb45ea2e5f0ca35af   Patch abandoned.

          People

            sharmaso Sonia Sharma (Inactive)
            sharmaso Sonia Sharma (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: