Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13777

Discovery push needs to account for fixed src nid when selecting destination nid

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      The patch https://review.whamcloud.com/38320:

      71ca66bcd9 LU-13471 lnet: use the same src nid for discovery
      

      fixes the source NID that is used for a discovery push, but we need to account for that when selecting a destination NID. Otherwise we may choose a destination NID on a different NET than the source NID.

      For example:

      00000400:00000080:10.0:1594492733.653168:0:18193:0:(module.c:90:libcfs_ioctl()) libcfs ioctl cmd 3221775693
      00000400:00000200:10.0:1594492733.653174:0:18193:0:(peer.c:1927:lnet_peer_queue_for_discovery()) Queue peer 624@gni99: 0
      00000400:00000200:13.0:1594492733.653222:0:3629:0:(peer.c:3414:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x6051
      00000400:00000200:13.0:1594492733.653231:0:3629:0:(lib-move.c:5100:LNetGet()) LNetGet -> 12345-624@gni
      00000400:00000200:13.0:1594492733.653237:0:3629:0:(lib-move.c:2658:lnet_handle_send_case_locked()) Source ANY to MR:  624@gni local destination
      00000400:00000200:13.0:1594492733.653256:0:3629:0:(lib-move.c:1436:lnet_select_peer_ni()) sd_best_lpni = 624@gni99
      00000400:00000200:13.0:1594492733.653270:0:3629:0:(lib-move.c:1858:lnet_handle_send()) rspt_next_hop_nid = 624@gni99
      00000400:00000200:13.0:1594492733.653271:0:3629:0:(lib-move.c:1873:lnet_handle_send()) TRACE: 593@gni99(593@gni99:<?>) -> 624@gni99(624@gni:624@gni99) <?> : GET try# 0
      
      00000400:00000200:13.0:1594492733.653286:0:3629:0:(peer.c:3053:lnet_peer_send_ping()) peer 624@gni99
      00000400:00000200:13.0:1594492733.653287:0:3629:0:(peer.c:3433:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x4251 rc 0
      00000400:00000200:13.0:1594492733.653565:0:3629:0:(peer.c:3414:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x40d1
      00000400:00000200:13.0:1594492733.653568:0:3629:0:(peer.c:2683:lnet_peer_merge_data()) cur lpni 624@gni status 363512030
      00000400:00000200:13.0:1594492733.653569:0:3629:0:(peer.c:2683:lnet_peer_merge_data()) cur lpni 624@gni99 status 363512030
      00000400:00000200:13.0:1594492733.653571:0:3629:0:(peer.c:2750:lnet_peer_merge_data()) peer 624@gni99 (ffff880835bdca00): 0
      00000400:00000200:13.0:1594492733.653572:0:3629:0:(peer.c:2945:lnet_peer_data_present()) peer 624@gni99(ffff880835bdca00): 0. state = 0x4151
      00000400:00000200:13.0:1594492733.653573:0:3629:0:(peer.c:3433:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x4151 rc 1
      00000400:00000200:13.0:1594492733.653574:0:3629:0:(peer.c:3414:lnet_peer_discovery()) peer 624@gni99(ffff880835bdca00) state 0x4151
      00000400:00000200:13.0:1594492733.653577:0:3629:0:(lib-move.c:4879:LNetPut()) LNetPut -> 12345-624@gni
      00000400:00000200:13.0:1594492733.653579:0:3629:0:(lib-move.c:2658:lnet_handle_send_case_locked()) Source Specified: 593@gni99 to MR:  624@gni local destination
      00000400:00000200:13.0:1594492733.653581:0:3629:0:(lib-move.c:1858:lnet_handle_send()) rspt_next_hop_nid = 624@gni
      00000400:00000200:13.0:1594492733.653583:0:3629:0:(lib-move.c:1873:lnet_handle_send()) TRACE: 593@gni99(593@gni99:593@gni99) -> 624@gni(624@gni:624@gni) <?> : PUT try# 0
      

      Attachments

        Activity

          People

            ashehata Amir Shehata (Inactive)
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: