Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14540

Connection failure does not cause peer NI health to decrement

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Connection is failing because of ARP flux, however the peer NI health is never decremented because the failure is classified as a "local" one:

      00000800:00020000:1.0:1615483922.587888:0:5629:0:(o2iblnd_cb.c:2933:kiblnd_rejected()) 10.12.2.4@o2ib41 rejected: consumer defined fatal error
      00000800:00000200:1.0:1615483922.587890:0:5629:0:(o2iblnd_cb.c:2313:kiblnd_connreq_done()) 10.12.2.4@o2ib41: active(1), version(12), status(-111)
      00000800:00000200:1.0:1615483922.587892:0:5629:0:(o2iblnd.c:420:kiblnd_unlink_peer_locked()) peer_ni[ffff8953de6a8600] -> 10.12.2.4@o2ib41 (2)--
      00000400:00000200:1.0:1615483922.587894:0:5629:0:(router.c:1720:lnet_notify()) 10.12.2.53@o2ib41 notifying 10.12.2.4@o2ib41: down
      00000800:00000100:1.0:1615483922.587896:0:5629:0:(o2iblnd_cb.c:2294:kiblnd_peer_connect_failed()) Deleting messages for 10.12.2.4@o2ib41: connection failed
      00000400:00000200:1.0:1615483922.587898:0:5629:0:(lib-msg.c:1011:lnet_is_health_check()) health check = 1, status = -111, hstatus = 2
      00000400:00000200:1.0:1615483922.587899:0:5629:0:(lib-msg.c:860:lnet_health_check()) health check: 10.12.2.53@o2ib41->10.12.2.4@o2ib41: GET: LOCAL_DROPPED
      00000400:00000200:1.0:1615483922.587901:0:5629:0:(lib-msg.c:479:lnet_handle_local_failure()) ni 10.12.2.53@o2ib41 added to recovery queue. Health = 900
      00000400:00000200:1.0:1615483922.587903:0:5629:0:(lib-msg.c:641:lnet_resend_msg_locked()) 10.12.2.53@o2ib41->10.12.2.4@o2ib41:GET:LOCAL_DROPPED - queuing msg (ffff895f4c9171d8) for resend
      

      It would be better to categorize this failure as REMOTE_DROPPED.

      This issue was seen with Lustre version 2.12.4.3_cray_44_g2942581

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: