[LU-16995] LNetError: 1094:0:(kfilnd_tn.c:1340:kfilnd_tn_state_fail()) LBUG Created: 27/Jul/23  Updated: 22/Aug/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Unresolved Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It is possible for the fabric to delay packets such that the retry handler cancels the message but it is still delivered to the target. If the timing is right then the initiator may receive a TAG_RX_OK event after the transaction has transitioned to TN_STATE_FAIL. This currently trips an LBUG, but we can instead modify kfilnd to allow the transaction to complete normally.



 Comments   
Comment by Gerrit Updater [ 27/Jul/23 ]

"Chris Horn <chris.horn@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51787
Subject: LU-16995 kfilnd: Handle TAG_RX_OK in TN_STATE_FAIL
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 85d20a5714e8b05193edb6fcbea0fe5e9b1ba6b0

Comment by Gerrit Updater [ 22/Aug/23 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51787/
Subject: LU-16995 kfilnd: Handle TAG_RX_OK in TN_STATE_FAIL
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 338801448049e002821f5935b40019e6a6addd3f

Generated at Sat Feb 10 03:31:43 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.