Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12264

The lnet peer discovery queue (lnet_peer.lp_dc_pendq) is susceptible to concurrent manipulation

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.13.0, Lustre 2.12.2
    • Fix Version/s: Lustre 2.13.0, Lustre 2.12.3
    • Labels:
      None
    • Severity:
      3
    • Rank (Obsolete):
      9223372036854775807

      Description

      This issue was discovered while testing the MR routing feature, but the bug exists in, at least, master and b2_12 as well.

      On an initial call to lnet_select_pathway(), messages are added to a peer's discovery queue only under protection of lnet_net_lock_current(). If two threads are sending to the same peer, but that peer is pending discovery, and the two threads are on different CPTs, then it is possible for them to perform the list add at the same time. This can result in corruption of the list.

        Attachments

          Activity

            People

            • Assignee:
              hornc Chris Horn
              Reporter:
              hornc Chris Horn
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: