[LU-2028] Potential data corruption in 'o2iblnd' (the IB LND driver) when using pre-mapped DMA buffers Created: 25/Sep/12  Updated: 26/Jun/17

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Martin Lizon (Inactive) Assignee: Amir Shehata (Inactive)
Resolution: Unresolved Votes: 0
Labels: o2iblnd, patch

Severity: 3
Rank (Obsolete): 4155

 Description   

Code inspection of the o2iblnd DMA handling code (for pre-mapped DMA buffers) found incorrect use of the DMA API that could potentially cause very-hard-to-debug data corruptions.

The DMA API Howto document (http://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt) clearly states:

If you need to use the same streaming DMA region multiple times and touch the data in between the DMA transfers, the buffer needs to be synced properly in order for the cpu and device to see the most uptodate and correct copy of the DMA buffer.
So, firstly, just map it with dma_map_

{single,sg}

, and after each DMA transfer call either:
dma_sync_single_for_cpu(dev, dma_handle, size, direction);
or:
dma_sync_sg_for_cpu(dev, sglist, nents, direction);
as appropriate.

'o2iblnd' does not make these calls in-between the DMA transfers. Without 'dma_sync_single_for_cpu', the new data might still be in the CPU cache, so when the HCA tries to DMA and send it out, it might DMA and send the obsolete data => resulting in data corruption.

It appears that at the moment we are luck that this issue has not affected us, but it just might be something difficult to hit/encounter on the x86/x86_64 systems.

The fix is trivial, and the benefit is prevention of very-hard-to-debug data corruption issues on HW architectures which would expose the incorrect use of the DMA API.



 Comments   
Comment by Isaac Huang (Inactive) [ 26/Sep/12 ]

Hi Martin,

That code has been there for as long as I can remember and it gets executed for almost each outgoing message, and yet as you mentioned it hadn't affected us so far. Do you have some insight why it's so hard to hit?

Comment by Doug Oucharek (Inactive) [ 26/Sep/12 ]

I think this could be a problem with ARM and MIPS but not X86 (which is why it has not caused a problem yet).

I believe this syncing is only needed if your interface is like PCIe and does not update cache when data is written into the buffer. However, on X86 data enters coherent domain and by default PCIe packets update cache. There is a bit in the header to configure this. I would expect this is on by default.

The DMA syncing routines seem to be implemented in the kernel for MIPS and ARM so those probably are subjected to this potential problem. Not sure about PPC.

Comment by Martin Lizon (Inactive) [ 26/Sep/12 ]

Hi Doug,

Thanks for the extensive explanation. My search on this topic found that there is an X86 feature called CPU self snoop (http://stackoverflow.com/questions/7132284/dma-cache-coherence-management) and it is configured by default in the x86 Linux kernel.

Hence, it is a preventive measure. I guess one way to try and hit this issue might be by turning off this bit and recompiling the kernel. However, it's possible that with this bit being set by default, other modules might not be compliant and other kernel instability might surface before you can even try to run Lustre/Lnet with IB.

Comment by Martin Lizon (Inactive) [ 03/Oct/12 ]

The Gerrit reference for this bug can be found here:
http://review.whamcloud.com/#change,4103

Comment by Nathan Rutman [ 21/Nov/12 ]

xyratex-bug-id: MRP-559

Generated at Sat Feb 10 01:21:45 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.