[LU-12264] The lnet peer discovery queue (lnet_peer.lp_dc_pendq) is susceptible to concurrent manipulation Created: 02/May/19  Updated: 08/Oct/19  Resolved: 26/Jul/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.2
Fix Version/s: Lustre 2.13.0, Lustre 2.12.3

Type: Bug Priority: Major
Reporter: Chris Horn Assignee: Chris Horn
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was discovered while testing the MR routing feature, but the bug exists in, at least, master and b2_12 as well.

On an initial call to lnet_select_pathway(), messages are added to a peer's discovery queue only under protection of lnet_net_lock_current(). If two threads are sending to the same peer, but that peer is pending discovery, and the two threads are on different CPTs, then it is possible for them to perform the list add at the same time. This can result in corruption of the list.



 Comments   
Comment by Gerrit Updater [ 02/May/19 ]

Chris Horn (hornc@cray.com) uploaded a new patch: https://review.whamcloud.com/34798
Subject: LU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set: 1
Commit: e1dc62e05db9b17fc0ddbe463fd68aef3eed1ff0

Comment by Gerrit Updater [ 07/Jun/19 ]

Amir Shehata (ashehata@whamcloud.com) merged in patch https://review.whamcloud.com/34798/
Subject: LU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock
Project: fs/lustre-release
Branch: multi-rail
Current Patch Set:
Commit: dd16a31bf4ae874a69cc7dc5fe1f3197993630ae

Comment by Gerrit Updater [ 03/Sep/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36037
Subject: LU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 84a6d4de7d664ad0248f496ee0187f88921b20f7

Comment by Gerrit Updater [ 04/Oct/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36037/
Subject: LU-12264 lnet: Protect lp_dc_pendq manipulation with lp_lock
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 0963d01e04e10ec09b6db045fe4110bf954d2b57

Generated at Sat Feb 10 02:51:03 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.