Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
When there are many messages being dropped, health feature introduced a path where it is possible to enter into a deep recursion path.
lnet_finalize()->lnet_health_check()->lnet_msg_decommit_tx()-> lnet_return_tx_credits_locked()->lnet_post_send_locked()->lnet_finalize()
This was dealth with in lnet_finalize() via keeping track of the finalizers thread using msc_finalizers. And returning if all slots are busy.
The above path doesn't have the same mechanism, therefore is susceptible to this problem.