Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14943

LNet health recovery of peer NIs on remote networks does not work correctly

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      this is a reply to a recovery ping:

      00000400:00000200:0.0:1628809321.584402:0:1949:0:(lib-move.c:3854:lnet_mt_event_handler()) Received event: 5 status: 0
      00000400:00000200:0.0:1628809321.584404:0:1949:0:(lib-move.c:3869:lnet_mt_event_handler()) 192.168.2.35@tcp1 recovery message sent successfully:0
      00000400:00000200:0.0:1628809321.585814:0:1949:0:(lib-move.c:4434:lnet_parse()) TRACE: 192.168.2.39@tcp2(192.168.2.39@tcp2) <- 192.168.2.35@tcp1 : REPLY - for me
      00000400:00000200:0.0:1628809321.585821:0:1949:0:(lib-move.c:4199:lnet_parse_reply()) 192.168.2.39@tcp2: Reply from 12345-192.168.2.35@tcp1 of length 64/64 into md 0x25
      00000400:00000200:0.0:1628809321.585827:0:1949:0:(lib-msg.c:1062:lnet_is_health_check()) health check = 1, status = 0, hstatus = 0
      00000400:00000200:0.0:1628809321.585830:0:1949:0:(lib-msg.c:836:lnet_health_check()) health check: 192.168.2.39@tcp2->192.168.2.33@tcp2: REPLY: OK
      

      Note, the reply is from 192.168.2.35@tcp1, but lnet_health_check is looking at the router NID that forwarded the message, 192.168.2.33@tcp2.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: