Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
When there are many messages being dropped, health feature introduced a path where it is possible to enter into a deep recursion path.
lnet_finalize()->lnet_health_check()->lnet_msg_decommit_tx()-> lnet_return_tx_credits_locked()->lnet_post_send_locked()->lnet_finalize()
This was dealth with in lnet_finalize() via keeping track of the finalizers thread using msc_finalizers. And returning if all slots are busy.
The above path doesn't have the same mechanism, therefore is susceptible to this problem.
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38367/
Subject:
LU-12402lnet: handle recursion in resendProject: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 41ed1c18082435624dc5a391511a5ff40ec79979