Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12955

AST replies are dropped when servers are non-MR, clients and routers are MR

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.14.0
    • Lustre 2.13.0, Lustre 2.14.0
    • None
    • 3
    • 9223372036854775807

    Description

      I found this issue while testing Cray's 2.12. Based on code inspection, I believe this issue also exists in 2.13/master (and maybe 2.10/11/12).

      Servers are all non-MR (lnet_peer_discovery_disabled=1), with a single NID on o2ib40.

      Clients are MR with NIDs on gni4 and gni99

      nid00110:~ # lctl list_nids
      110@gni99
      110@gni4
      nid00110:~ #
      

      Routers are MR with NIDs on gni4, gni99 and o2ib40

      nid00485:~ # lctl list_nids
      485@gni99
      485@gni4
      10.12.0.1@o2ib40
      nid00485:~ #
      

      NMR Server sends BL_AST to client 110@gni4:

      00000400:00000200:3.0:1573329689.897722:0:11351:0:(lib-move.c:2429:lnet_send()) TRACE: 10.12.0.52@o2ib40(<?>:10.12.0.52@o2ib40) ->(<?>)-> 110@gni4(110@gni4:10.12.0.3@o2ib40) : PUT try# 0
      

      Router gets message:

      00000400:00000200:12.0:1573329689.898682:0:8631:0:(lib-move.c:3904:lnet_parse()) TRACE: 110@gni4(10.12.0.3@o2ib40) <- 10.12.0.52@o2ib40 : PUT - routed
      

      Since router/client are both MR, router chooses different destination NID based on the round-robin selection of the local NI:

      00000400:00000200:12.0:1573329689.898693:0:8631:0:(lib-move.c:1673:lnet_get_best_ni()) compare ni 93@gni99 [c:2048, d:10, s:22285] with best_ni not seleced [c:-2147483648, d:-1, s:0]
      00000400:00000200:12.0:1573329689.898695:0:8631:0:(lib-move.c:1716:lnet_get_best_ni()) selected best_ni 93@gni99
      00000400:00000200:12.0:1573329689.898695:0:8631:0:(lib-move.c:1673:lnet_get_best_ni()) compare ni 93@gni4 [c:2048, d:10, s:22285] with best_ni 93@gni99 [c:2048, d:10, s:22285]
      00000400:00000200:12.0:1573329689.898697:0:8631:0:(lib-move.c:1716:lnet_get_best_ni()) selected best_ni 93@gni99
      00000400:00000200:12.0:1573329689.898701:0:8631:0:(lib-move.c:1441:lnet_select_peer_ni()) Selected 110@gni99 h:[1000] p:[n] c:[16], s:[4704]
      00000400:00000200:12.0:1573329689.898705:0:8631:0:(lib-move.c:2429:lnet_send()) TRACE: 10.12.0.52@o2ib40(<?>:93@gni99) ->(<?>)-> 110@gni99(110@gni4:110@gni99) : PUT try# 0
      

      Client gets this message and passes it to to ptlrpc. PtlRPC sends a reply using the 110@gni99 as a source NI (see ptlrpc_send_reply()):

      00000100:00000040:16.0:1573329689.898297:0:11036:0:(lustre_net.h:2496:ptlrpc_rqphase_move()) @@@ move req "New" -> "Interpret"  req@ffff880f9682a040 x1649753532140528/t0(0) o104->LOV_OSC_UUID@10.12.0.52@o2ib40:224/0 lens 296/0 e 0 to 0 dl 1573329744 ref 1 fl New:/0/ffffffff rc 0/-1 job:''
      00000100:00100000:16.0:1573329689.898302:0:11036:0:(service.c:2227:ptlrpc_server_handle_request()) Handling RPC req@ffff880f9682a040 pname:cluuid+ref:pid:xid:nid:opc:job ldlm_cb01_001:LOV_OSC_UUID+4:11351:x1649753532140528:12345-10.12.0.52@o2ib40:104:
      00000100:00000200:16.0:1573329689.898305:0:11036:0:(service.c:2232:ptlrpc_server_handle_request()) got req 1649753532140528
      00000100:00000040:16.0:1573329689.898316:0:11036:0:(connection.c:132:ptlrpc_connection_addref()) conn=ffff880f95fd7780 refcount 10 to 10.12.0.52@o2ib40
      00000100:00000040:16.0:1573329689.898318:0:11036:0:(niobuf.c:57:ptl_send_buf()) peer_id 12345-10.12.0.52@o2ib40
      00000100:00000200:16.0:1573329689.898321:0:11036:0:(niobuf.c:85:ptl_send_buf()) Sending 192 bytes to portal 16, xid 1649753532140528, offset 192
      00000400:00000200:16.0:1573329689.898323:0:11036:0:(lib-move.c:4412:LNetPut()) LNetPut -> 12345-10.12.0.52@o2ib40
      00000400:00000200:16.0:1573329689.898373:0:11036:0:(lib-move.c:2429:lnet_send()) TRACE: 110@gni99(110@gni99:110@gni99) ->(<?>)-> 10.12.0.52@o2ib40(10.12.0.52@o2ib40:93@gni99) : PUT try# 0
      

      When this PUT arrives on the server, it is dropped because the server does not know about gni99 NIDs:

      00000400:00000200:19.0:1573329689.898530:0:10540:0:(lib-move.c:3904:lnet_parse()) TRACE: 10.12.0.52@o2ib40(10.12.0.52@o2ib40) <- 110@gni99 : PUT - for me
      00000400:00000200:19.0:1573329689.898533:0:10540:0:(lib-ptl.c:571:lnet_ptl_match_md()) Request from 12345-110@gni99 of length 192 into portal 16 MB=0x5dc712d400bf0
      00000400:00000100:19.0:1573329689.898535:0:10540:0:(lib-move.c:3542:lnet_parse_put()) Dropping PUT from 12345-110@gni99 portal 16 match 1649753532140528 offset 192 length 192: 4
      

      Issue is pretty easy to reproduce. Just perform I/O to cause AST to get sent, and watch logs on the servers for the "Dropping PUT" message:

      saturn-p2:~ # ssh nid00110 'dd if=/dev/zero of=/lus/snx11922/hornc/test.txt bs=1024k count=1 oflag=direct'
      1+0 records in
      1+0 records out
      1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0161523 s, 64.9 MB/s
      saturn-p2:~ # dd if=/dev/zero of=/lus/snx11922/hornc/test.txt bs=1024k count=1 oflag=direct
      <command hangs>
      
      Nov  9 15:28:15 snx11922n005 kernel: LNet: 29024:0:(lib-move.c:3542:lnet_parse_put()) Dropping PUT from 12345-110@gni99 portal 16 match 1649753595060784 offset 192 length 192: 4
      Nov  9 15:28:15 snx11922n005 kernel: LNet: 29024:0:(lib-move.c:3542:lnet_parse_put()) Skipped 1 previous similar message
      

      I will try to reproduce this master, and I'll update the affects version field as appropriate.

      Lastly, I'll note that I was running with this patch https://review.whamcloud.com/#/c/36512/ because it is necessary to correctly classify the MR capabilities of peers.

      Attachments

        Issue Links

          Activity

            People

              ashehata Amir Shehata (Inactive)
              hornc Chris Horn
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: