[LU-12402] LNet Health: lnet_finalize() recursion Created: 07/Jun/19  Updated: 01/May/20  Resolved: 21/Aug/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.5

Type: Bug Priority: Major
Reporter: Amir Shehata (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-13483 Apparently infinite recursion in lnet... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When there are many messages being dropped, health feature introduced a path where it is possible to enter into a deep recursion path.

lnet_finalize()->lnet_health_check()->lnet_msg_decommit_tx()->
lnet_return_tx_credits_locked()->lnet_post_send_locked()->lnet_finalize()

This was dealth with in lnet_finalize() via keeping track of the finalizers thread using msc_finalizers. And returning if all slots are busy.

The above path doesn't have the same mechanism, therefore is susceptible to this problem.



 Comments   
Comment by Gerrit Updater [ 06/Jul/19 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35431
Subject: LU-12402 lnet: handle recursion in resend
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: a404c988fb57780886222030e98c847fd1f5408a

Comment by Gerrit Updater [ 21/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35431/
Subject: LU-12402 lnet: handle recursion in resend
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: ad9243693c9a5a5b2c34165ad853ddf5ceec4617

Comment by Peter Jones [ 21/Aug/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 24/Apr/20 ]

Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/38355
Subject: LU-12402 lnet: handle recursion in resend
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 7bb9db8377743364112038316a24a3c272aa52ee

Comment by Gerrit Updater [ 24/Apr/20 ]

Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch: https://review.whamcloud.com/38358
Subject: LU-12402 lnet: handle recursion in resend
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 0d0a06036f2f4d033810de6bd0b43fe45fe4ba6e

Comment by Gerrit Updater [ 25/Apr/20 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38367
Subject: LU-12402 lnet: handle recursion in resend
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: f073a913df66c85a2e8ca44818d803dd91ab6dfc

Comment by Gerrit Updater [ 01/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38367/
Subject: LU-12402 lnet: handle recursion in resend
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 41ed1c18082435624dc5a391511a5ff40ec79979

Generated at Sat Feb 10 02:52:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.