[LU-14499] o2iblnd: LU-13368 changes cause shutdown procedure to not complete Created: 08/Mar/21 Updated: 30/Jan/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Serguei Smirnov | Assignee: | Serguei Smirnov |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
Changes applied by the patches from In that case, messages similar to the following keep showing up in the log: [51025.354675] LNet: 9402:0:(o2iblnd.c:3107:kiblnd_shutdown()) 10.1.11.124@o2ib10: waiting for 3 peers to disconnect [51029.354481] LNet: 9402:0:(o2iblnd.c:3107:kiblnd_shutdown()) 10.1.11.124@o2ib10: waiting for 3 peers to disconnect [51037.353971] LNet: 9402:0:(o2iblnd.c:3107:kiblnd_shutdown()) 10.1.11.124@o2ib10: waiting for 3 peers to disconnect
|
| Comments |
| Comment by Gerrit Updater [ 08/Mar/21 ] |
|
Serguei Smirnov (ssmirnov@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41937 |
| Comment by Chris Horn [ 14/Feb/22 ] |
|
ssmirnov could this issue impact ksocklnd as well? |
| Comment by Serguei Smirnov [ 14/Feb/22 ] |
|
Chris, Despite concluding that |
| Comment by Chris Horn [ 22/Aug/22 ] |
|
We traced a memory leak back to the |
| Comment by Olaf Faaland [ 12/Jan/23 ] |
|
Hi Serguei, |
| Comment by Serguei Smirnov [ 12/Jan/23 ] |
|
Hi Olaf, From comments in ys, sihara: is my understanding correct? Thanks, Serguei. |
| Comment by Yang Sheng [ 13/Jan/23 ] |
|
Hi, Serguei, Yes, you are right. |
| Comment by Olaf Faaland [ 13/Jan/23 ] |
|
What are the gerrit URLs for those changes? Thanks. |
| Comment by Serguei Smirnov [ 16/Jan/23 ] |
|
Hi Olaf, ys will correct me if I'm wrong, but I believe these are the two changes which are supposed to be fixing the original "discard the callback": https://review.whamcloud.com/#/c/fs/lustre-release/+/40937/ https://review.whamcloud.com/#/c/fs/lustre-release/+/41970/ Thanks, Serguei.
|
| Comment by Yang Sheng [ 17/Jan/23 ] |
|
Sorry for the delay. Yes, Serguei is right. |
| Comment by Olaf Faaland [ 23/Jan/23 ] |
|
Hi Serguei and Yang Sheng, Thanks for clarifying. It looks like changes 40937 and 41970 aren't progressing. Are you waiting on something? Thanks |
| Comment by Serguei Smirnov [ 30/Jan/23 ] |
|
Hi Olaf, the problem here appears to be that even though the patches are code-complete and Maloo-tested, we're not able to verify Yang Sheng's fixes in a proper IB environment as Shuichi doesn't have the available resources. Would you be able to give these patches a try on your system?
|