Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2028

Potential data corruption in 'o2iblnd' (the IB LND driver) when using pre-mapped DMA buffers

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • 3
    • 4155

    Description

      Code inspection of the o2iblnd DMA handling code (for pre-mapped DMA buffers) found incorrect use of the DMA API that could potentially cause very-hard-to-debug data corruptions.

      The DMA API Howto document (http://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt) clearly states:

      If you need to use the same streaming DMA region multiple times and touch the data in between the DMA transfers, the buffer needs to be synced properly in order for the cpu and device to see the most uptodate and correct copy of the DMA buffer.
      So, firstly, just map it with dma_map_

      {single,sg}

      , and after each DMA transfer call either:
      dma_sync_single_for_cpu(dev, dma_handle, size, direction);
      or:
      dma_sync_sg_for_cpu(dev, sglist, nents, direction);
      as appropriate.

      'o2iblnd' does not make these calls in-between the DMA transfers. Without 'dma_sync_single_for_cpu', the new data might still be in the CPU cache, so when the HCA tries to DMA and send it out, it might DMA and send the obsolete data => resulting in data corruption.

      It appears that at the moment we are luck that this issue has not affected us, but it just might be something difficult to hit/encounter on the x86/x86_64 systems.

      The fix is trivial, and the benefit is prevention of very-hard-to-debug data corruption issues on HW architectures which would expose the incorrect use of the DMA API.

      Attachments

        Activity

          [LU-2028] Potential data corruption in 'o2iblnd' (the IB LND driver) when using pre-mapped DMA buffers

          xyratex-bug-id: MRP-559

          nrutman Nathan Rutman added a comment - xyratex-bug-id: MRP-559

          The Gerrit reference for this bug can be found here:
          http://review.whamcloud.com/#change,4103

          mlizon Martin Lizon (Inactive) added a comment - The Gerrit reference for this bug can be found here: http://review.whamcloud.com/#change,4103

          Hi Doug,

          Thanks for the extensive explanation. My search on this topic found that there is an X86 feature called CPU self snoop (http://stackoverflow.com/questions/7132284/dma-cache-coherence-management) and it is configured by default in the x86 Linux kernel.

          Hence, it is a preventive measure. I guess one way to try and hit this issue might be by turning off this bit and recompiling the kernel. However, it's possible that with this bit being set by default, other modules might not be compliant and other kernel instability might surface before you can even try to run Lustre/Lnet with IB.

          mlizon Martin Lizon (Inactive) added a comment - Hi Doug, Thanks for the extensive explanation. My search on this topic found that there is an X86 feature called CPU self snoop ( http://stackoverflow.com/questions/7132284/dma-cache-coherence-management ) and it is configured by default in the x86 Linux kernel. Hence, it is a preventive measure. I guess one way to try and hit this issue might be by turning off this bit and recompiling the kernel. However, it's possible that with this bit being set by default, other modules might not be compliant and other kernel instability might surface before you can even try to run Lustre/Lnet with IB.

          I think this could be a problem with ARM and MIPS but not X86 (which is why it has not caused a problem yet).

          I believe this syncing is only needed if your interface is like PCIe and does not update cache when data is written into the buffer. However, on X86 data enters coherent domain and by default PCIe packets update cache. There is a bit in the header to configure this. I would expect this is on by default.

          The DMA syncing routines seem to be implemented in the kernel for MIPS and ARM so those probably are subjected to this potential problem. Not sure about PPC.

          doug Doug Oucharek (Inactive) added a comment - I think this could be a problem with ARM and MIPS but not X86 (which is why it has not caused a problem yet). I believe this syncing is only needed if your interface is like PCIe and does not update cache when data is written into the buffer. However, on X86 data enters coherent domain and by default PCIe packets update cache. There is a bit in the header to configure this. I would expect this is on by default. The DMA syncing routines seem to be implemented in the kernel for MIPS and ARM so those probably are subjected to this potential problem. Not sure about PPC.

          Hi Martin,

          That code has been there for as long as I can remember and it gets executed for almost each outgoing message, and yet as you mentioned it hadn't affected us so far. Do you have some insight why it's so hard to hit?

          isaac Isaac Huang (Inactive) added a comment - Hi Martin, That code has been there for as long as I can remember and it gets executed for almost each outgoing message, and yet as you mentioned it hadn't affected us so far. Do you have some insight why it's so hard to hit?

          People

            ashehata Amir Shehata (Inactive)
            mlizon Martin Lizon (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated: