Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.13.0, Lustre 2.12.2
-
None
-
3
-
9223372036854775807
Description
This issue was discovered while testing the MR routing feature, but the bug exists in, at least, master and b2_12 as well.
On an initial call to lnet_select_pathway(), messages are added to a peer's discovery queue only under protection of lnet_net_lock_current(). If two threads are sending to the same peer, but that peer is pending discovery, and the two threads are on different CPTs, then it is possible for them to perform the list add at the same time. This can result in corruption of the list.