[LU-13883] LNetPrimaryNID assumes lpni for a particular NID will not change through discovery but lnet_peer_add_nid() may allocate a new one for it. Created: 06/Aug/20 Updated: 26/Aug/22 Resolved: 11/Mar/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Chris Horn | Assignee: | Serguei Smirnov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lnet, multi-rail | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
After server reboot (failover/failback test) client connnects to server using non-primary NID 192.168.2.38@tcp2. Server calls LNetPrimaryNID to setup export. New peer is setup and queued for discovery: 00000020:00000040:0.0:1596142526.904195:0:3881:0:(genops.c:1009:class_export_get()) GETting export ffff9efa080c4000 : new refcount 6 00000100:00000040:0.0:1596142526.904198:0:3881:0:(service.c:1054:ptlrpc_request_change_export()) RPC GETting export ffff9efa080c4000 : new rpc_count 1 00000400:00000200:0.0:1596142526.904210:0:3881:0:(peer.c:285:lnet_peer_alloc()) ffff9efa0c071c00 nid 192.168.2.38@tcp2 00000400:00000200:0.0:1596142526.904215:0:3881:0:(peer.c:220:lnet_peer_net_alloc()) ffff9efa22389a40 net tcp2 00000400:00000200:0.0:1596142526.904222:0:3881:0:(peer.c:202:lnet_peer_ni_alloc()) ffff9efa0c550000 nid 192.168.2.38@tcp2 00000400:00000200:0.0:1596142526.904228:0:3881:0:(peer.c:1329:lnet_peer_attach_peer_ni()) peer 192.168.2.38@tcp2 NID 192.168.2.38@tcp2 flags 0x0 00000400:00000200:0.0:1596142526.904233:0:3881:0:(lib-lnet.h:96:lnet_peer_set_state()) Peer 192.168.2.38@tcp2(ffff9efa0c071c00) 8192 state 0x2000 00000400:00000200:0.0:1596142526.904236:0:3881:0:(lib-lnet.h:96:lnet_peer_set_state()) Peer 192.168.2.38@tcp2(ffff9efa0c071c00) 16384 state 0x6000 00000400:00000200:0.0:1596142526.904241:0:3881:0:(lib-lnet.h:96:lnet_peer_set_state()) Peer 192.168.2.38@tcp2(ffff9efa0c071c00) 64 state 0x6040 00000400:00000200:0.0:1596142526.904249:0:3881:0:(peer.c:1931:lnet_peer_queue_for_discovery()) Queue peer 192.168.2.38@tcp2: 0 00000400:00000200:0.0:1596142526.904251:0:3881:0:(peer.c:2244:lnet_discover_peer_locked()) Discovery attempt # 1 Discovery ping is sent: 00000400:00000200:0.0:1596142526.904351:0:2860:0:(peer.c:3342:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:0.0:1596142526.904355:0:2860:0:(peer.c:3457:lnet_peer_discovery()) peer 192.168.2.38@tcp2(ffff9efa0c071c00) state 0x6040 00000400:00000200:0.0:1596142526.904359:0:2860:0:(peer.c:3057:lnet_peer_send_ping()) peer 192.168.2.38@tcp2(ffff9efa0c071c00) 00000400:00000200:0.0:1596142526.904362:0:2860:0:(lib-lnet.h:96:lnet_peer_set_state()) Peer 192.168.2.38@tcp2(ffff9efa0c071c00) 512 state 0x6240 00000400:00000200:0.0:1596142526.904365:0:2860:0:(lib-lnet.h:104:lnet_peer_clear_state()) Peer 192.168.2.38@tcp2(ffff9efa0c071c00) 8192 state 0x4240 00000400:00000200:0.0:1596142526.904369:0:2860:0:(peer.c:3008:lnet_peer_select_nid()) peer 192.168.2.38@tcp2(ffff9efa0c071c00) 00000400:00000200:0.0:1596142526.904381:0:2860:0:(lib-move.c:5126:LNetGet()) LNetGet -> 12345-192.168.2.38@tcp2 Ping reply is processed. 00000400:00000200:0.0:1596142526.905282:0:2860:0:(peer.c:3342:lnet_peer_discovery_wait_for_work()) woken: 0 00000400:00000200:0.0:1596142526.905287:0:2860:0:(peer.c:3457:lnet_peer_discovery()) peer 192.168.2.38@tcp2(ffff9efa0c071c00) state 0x40c1 Discovery thread looks up the primary nid in the ping buffer and it finds an existing lpni (some other request from the same client arrived from the client's 192.168.2.38@tcp99 NID). 00000400:00000200:0.0:1596142526.905291:0:2860:0:(peer.c:2851:lnet_peer_data_present()) peer 192.168.2.38@tcp2(ffff9efa0c071c00) 00000400:00000200:0.0:1596142526.905308:0:2860:0:(peer.c:2771:lnet_peer_set_primary_data()) peer 192.168.2.38@tcp99(ffff9ef9de8dce00) 00000400:00000200:0.0:1596142526.905311:0:2860:0:(peer.c:1931:lnet_peer_queue_for_discovery()) Queue peer 192.168.2.38@tcp99: -114 After reconciling the two peers in lnet_peer_set_primary_data(), we merge the info in pbuf with the peer object for primary NID 192.168.2.38@tcp99. 00000400:00000200:0.0:1596142526.905323:0:2860:0:(peer.c:2622:lnet_peer_merge_data()) peer 192.168.2.38@tcp99(ffff9ef9de8dce00) When we call lnet_peer_add_nid() for the 192.168.2.38@tcp2, that function sees the existing lpni for that NID, and sees that the NID is "primary" for that peer object so it deletes the peer, and the peer NI, and creates a new lpni for the 192.168.2.38@tcp2 NID. 00000400:00000200:0.0:1596142526.905333:0:2860:0:(peer.c:457:lnet_peer_del_locked()) peer 192.168.2.38@tcp2 00000400:00000200:0.0:1596142526.905337:0:2860:0:(peer.c:377:lnet_peer_detach_peer_ni_locked()) peer 192.168.2.38@tcp2 NID 192.168.2.38@tcp2 00000400:00000200:0.0:1596142526.905341:0:2860:0:(peer.c:202:lnet_peer_ni_alloc()) ffff9efa0c550800 nid 192.168.2.38@tcp2 00000400:00000200:0.0:1596142526.905344:0:2860:0:(peer.c:220:lnet_peer_net_alloc()) ffff9efa0d340d40 net tcp2 But, LNetPrimaryNID still has reference on the original lpni for 192.168.2.38@tcp2 NID, and it is going to use that lpni to lookup the primary NID of the peer when discovery completes. Below we can see that after discovery completes, LNetPrimaryNID releases its reference on the lpni which allows the peer NI, peer net and peer objects to be freed. LNetPrimaryNID then returns the wrong nid for this peer. ... 00000400:00000200:0.0:1596142526.905414:0:2860:0:(peer.c:1949:lnet_peer_discovery_complete()) Discovery complete. Dequeue peer 192.168.2.38@tcp2 ... 00000400:00000200:0.0:1596142526.905518:0:3881:0:(peer.c:2292:lnet_discover_peer_locked()) peer 192.168.2.38@tcp2 NID 192.168.2.38@tcp2: -113. discovery complete 00000400:00000200:0.0:1596142526.905522:0:3881:0:(peer.c:1721:lnet_destroy_peer_ni_locked()) ffff9efa0c550000 nid 192.168.2.38@tcp2 00000400:00000200:0.0:1596142526.905526:0:3881:0:(peer.c:230:lnet_destroy_peer_net_locked()) ffff9efa22389a40 net tcp2 00000400:00000200:0.0:1596142526.905529:0:3881:0:(peer.c:293:lnet_destroy_peer_locked()) ffff9efa0c071c00 nid 192.168.2.38@tcp2 00000400:00000200:0.0:1596142526.905535:0:3881:0:(peer.c:1232:LNetPrimaryNID()) NID 192.168.2.38@tcp2 primary NID 192.168.2.38@tcp2 rc -113 Here's that code in LNetPrimaryNID(): rc = lnet_discover_peer_locked(lpni, cpt, true);
if (rc)
goto out_decref;
lp = lpni->lpni_peer_net->lpn_peer;
/* Only try once if discovery is disabled */
if (lnet_is_discovery_disabled(lp))
break;
}
primary_nid = lp->lp_primary_nid;
out_decref:
lnet_peer_ni_decref_locked(lpni);
out_unlock:
lnet_net_unlock(cpt);
CDEBUG(D_NET, "NID %s primary NID %s rc %d\n", libcfs_nid2str(nid),
libcfs_nid2str(primary_nid), rc);
return primary_nid;
I can think of a couple solutions: 2. Modify LNetPrimaryNID() (and probably lnet_discover_peer_locked() [Edit: It looks like all callers of lnet_discover_peer_locked() would also need to be adjusted]) so that they do not assume lpni is persistent across discovery. |
| Comments |
| Comment by Chris Horn [ 07/Aug/20 ] |
|
ssmirnov I've been working on a series of patches related to this issue. Do you have any code in progress for it already or just started looking at it now? |
| Comment by Gerrit Updater [ 07/Aug/20 ] |
|
Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/39606 |
| Comment by Chris Horn [ 07/Aug/20 ] |
|
The patch implements the idea in #2 of the description, but only for LNetPrimaryNID() and lnet_discover(). I don't think we need to adjust lnet_initiate_peer_discovery()... but I could be mistaken. |
| Comment by Chris Horn [ 07/Aug/20 ] |
|
Also note that there are other issues with this particular use-case that can still cause LNetPrimaryNID to return the wrong NID. The patch in this ticket is part of series which addresses those other issues. |
| Comment by Gerrit Updater [ 11/Aug/20 ] |
|
Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/39650 |
| Comment by Serguei Smirnov [ 11/Aug/20 ] |
|
Hi Chris, I only got around to this now, so I don't have anything that would be in conflict with your proposal. Thanks, Serguei. |
| Comment by Etienne Aujames [ 27/Aug/20 ] |
|
Hi, Am I right to assume that this LU is related to LNet's multirail feature? |
| Comment by Gerrit Updater [ 27/Aug/20 ] |
|
Chris Horn (chris.horn@hpe.com) uploaded a new patch: https://review.whamcloud.com/39747 |
| Comment by Gerrit Updater [ 10/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/39747/ |
| Comment by Peter Jones [ 11/Mar/21 ] |
|
Landed for 2.15 |
| Comment by Gerrit Updater [ 22/Apr/21 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/43413 |