Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12264

The lnet peer discovery queue (lnet_peer.lp_dc_pendq) is susceptible to concurrent manipulation

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.13.0, Lustre 2.12.2
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was discovered while testing the MR routing feature, but the bug exists in, at least, master and b2_12 as well.

      On an initial call to lnet_select_pathway(), messages are added to a peer's discovery queue only under protection of lnet_net_lock_current(). If two threads are sending to the same peer, but that peer is pending discovery, and the two threads are on different CPTs, then it is possible for them to perform the list add at the same time. This can result in corruption of the list.

      Attachments

        Activity

          People

            hornc Chris Horn
            hornc Chris Horn
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: