[LU-12402] LNet Health: lnet_finalize() recursion Created: 07/Jun/19 Updated: 01/May/20 Resolved: 21/Aug/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Amir Shehata (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
When there are many messages being dropped, health feature introduced a path where it is possible to enter into a deep recursion path. lnet_finalize()->lnet_health_check()->lnet_msg_decommit_tx()-> lnet_return_tx_credits_locked()->lnet_post_send_locked()->lnet_finalize() This was dealth with in lnet_finalize() via keeping track of the finalizers thread using msc_finalizers. And returning if all slots are busy. The above path doesn't have the same mechanism, therefore is susceptible to this problem. |
| Comments |
| Comment by Gerrit Updater [ 06/Jul/19 ] |
|
Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35431 |
| Comment by Gerrit Updater [ 21/Aug/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35431/ |
| Comment by Peter Jones [ 21/Aug/19 ] |
|
Landed for 2.13 |
| Comment by Gerrit Updater [ 24/Apr/20 ] |
|
Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/38355 |
| Comment by Gerrit Updater [ 24/Apr/20 ] |
|
Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/38358 |
| Comment by Gerrit Updater [ 25/Apr/20 ] |
|
Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38367 |
| Comment by Gerrit Updater [ 01/May/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38367/ |