[LU-13807] LNet: Net delete dead lock deadlock Created: 20/Jul/20 Updated: 07/Aug/20 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Amir Shehata (Inactive) | Assignee: | Serguei Smirnov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
While using an LUTF script which deletes a net dynamically via lnetctl import --del config.yaml We ran into a dead lock. It appears like a local NI is in recovery. A message is about to get sent to this NI. We call lnet_nid2peerni_locked() which locks the mutex. However this mutex is held by the process trying to delete the net, leading to a dead lock. The code should first clean up the recovery queue. IE remove all instances of this NI from the recovery queue before deleting the NI. A similar processing is done in LNetNIFini() |