[LU-13923] LNet: lnetctl "lnet unconfigure" or "net del" hangs if executed on a gateway Created: 25/Aug/20  Updated: 24/Mar/22  Resolved: 24/Mar/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Serguei Smirnov Assignee: Serguei Smirnov
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-13896 Reference leak in check for mr_forwar... Resolved
Epic/Theme: lnet
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Steps to reproduce:

Configure 3 nodes: PeerA (tcp), PeerB (tcp1) and GW1 (tcp, tcp1).

Configure GW1 to act as a router and add corresponding routes to PeerA/PeerB.

Verify connectivity between PeerA and PeerB by executing "lnetctl ping" back and forth.

Execute "lnetctl lnet unconfigure" on GW1. The command hangs.

The issue affects tag 2.13.55 



 Comments   
Comment by Serguei Smirnov [ 25/Aug/20 ]

The issue has been introduced by LU-13606

https://review.whamcloud.com/38798

The change added a call to lnet_nid2peerni_locked which is incrementing a reference on a peer_ni object which is never decremented. This results in LNet being unable to cleanup properly on "unconfigure" or "net del" operation, causing lnetctl to hang.

Comment by Gerrit Updater [ 25/Aug/20 ]

Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/39731
Subject: LU-13923 lnet: Add missing lnet_peer_ni_decref call
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 8564e47bc2d37102e742f8e688d87291ff39780a

Comment by Chris Horn [ 24/Mar/22 ]

This ticket is a duplicate of LU-13896.

Generated at Sat Feb 10 03:05:21 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.