[LU-12199] md's are not detached from uncommitted messages that have health check performed on them Created: 18/Apr/19  Updated: 04/Oct/19  Resolved: 26/Jul/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0, Lustre 2.13.0
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Major
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

It's possible for lnet_is_health_check() to return "true" when the
message has not hit the network. In this situation the message is freed
without detaching the MD. As a result, requests do not receive their
unlink events and these requests are stuck forever.

This issue was discovered while testing the MR routing feature under LNet router failure conditions.

Bug was introduced by the LNet health feature commit 70616605dd44be37068f4e1a4745a2f8b90eb1f5 https://review.whamcloud.com/32764



 Comments   
Comment by Gerrit Updater [ 18/Apr/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/34709
Subject: LU-12199 lnet: Ensure md is detached when msg is not committed
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: cffdb4bb4afc698bb4df6cef9d74d85cd0e2b876

Comment by Gerrit Updater [ 02/May/19 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34797
Subject: LU-12199 lnet: verify msg is commited for send/recv
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: a4cc4392e989fe33299324a9ebb3d7fdfa45baad

Comment by Gerrit Updater [ 16/May/19 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34885
Subject: LU-12199 lnet: Ensure md is detached when msg is not committed
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: cac7eba2fe4f0852dbf416388ce6831027c1f555

Comment by Gerrit Updater [ 28/May/19 ]

Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34971
Subject: LU-12199 lnet: verify msg is commited for send/recv
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6b9f43a82e6fe8ce90cb925c9e46023cc76a196c

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/34885/
Subject: LU-12199 lnet: Ensure md is detached when msg is not committed
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: b65f3a1767ae82c7f629320187b33eb8670da537

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/34797/
Subject: LU-12199 lnet: verify msg is commited for send/recv
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: fc6b321036f34c00d5b32b49c817dc0034fbad9e

Comment by Gerrit Updater [ 03/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36038
Subject: LU-12199 lnet: Ensure md is detached when msg is not committed
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 7699683cd5779316ff7d9429df1a7428c978b2b0

Comment by Gerrit Updater [ 03/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36039
Subject: LU-12199 lnet: verify msg is commited for send/recv
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: ae2a031220d21bf3a511457cbe091134278c0cec

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36038/
Subject: LU-12199 lnet: Ensure md is detached when msg is not committed
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: d5a05a56fa29259b28dcc766af391ee0f3a357fd

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36039/
Subject: LU-12199 lnet: verify msg is commited for send/recv
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 73c8ae59cb2bd8352301d8f09ef1309adb5c8202

Generated at Sat Feb 10 02:50:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.