<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:21:45 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2028] Potential data corruption in &apos;o2iblnd&apos; (the IB LND driver) when using pre-mapped DMA buffers</title>
                <link>https://jira.whamcloud.com/browse/LU-2028</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Code inspection of the o2iblnd DMA handling code (for pre-mapped DMA buffers) found incorrect use of the DMA API that could potentially cause very-hard-to-debug data corruptions.&lt;/p&gt;

&lt;p&gt;The DMA API Howto document (&lt;a href=&quot;http://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt&lt;/a&gt;) clearly states:&lt;/p&gt;

&lt;p&gt;If you need to use the same streaming DMA region multiple times and touch the data in between the DMA transfers, the buffer needs to be synced properly in order for the cpu and device to see the most uptodate and correct copy of the DMA buffer.&lt;br/&gt;
So, firstly, just map it with dma_map_&lt;/p&gt;
{single,sg}
&lt;p&gt;, and after each DMA transfer call either:&lt;br/&gt;
dma_sync_single_for_cpu(dev, dma_handle, size, direction);&lt;br/&gt;
or:&lt;br/&gt;
dma_sync_sg_for_cpu(dev, sglist, nents, direction);&lt;br/&gt;
as appropriate.&lt;/p&gt;

&lt;p&gt;&apos;o2iblnd&apos; does not make these calls in-between the DMA transfers. Without &apos;dma_sync_single_for_cpu&apos;, the new data might still be in the CPU cache, so when the HCA tries to DMA and send it out, it might DMA and send the obsolete data =&amp;gt; resulting in data corruption.&lt;/p&gt;

&lt;p&gt;It appears that at the moment we are luck that this issue has not affected us, but it just might be something difficult to hit/encounter on the x86/x86_64 systems.&lt;/p&gt;

&lt;p&gt;The fix is trivial, and the benefit is prevention of very-hard-to-debug data corruption issues on HW architectures which would expose the incorrect use of the DMA API.&lt;/p&gt;</description>
                <environment></environment>
        <key id="16119">LU-2028</key>
            <summary>Potential data corruption in &apos;o2iblnd&apos; (the IB LND driver) when using pre-mapped DMA buffers</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="ashehata">Amir Shehata</assignee>
                                    <reporter username="mlizon">Martin Lizon</reporter>
                        <labels>
                            <label>o2iblnd</label>
                            <label>patch</label>
                    </labels>
                <created>Tue, 25 Sep 2012 14:28:48 +0000</created>
                <updated>Mon, 26 Jun 2017 17:39:45 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="45603" author="isaac" created="Wed, 26 Sep 2012 16:36:57 +0000"  >&lt;p&gt;Hi Martin,&lt;/p&gt;

&lt;p&gt;That code has been there for as long as I can remember and it gets executed for almost each outgoing message, and yet as you mentioned it hadn&apos;t affected us so far. Do you have some insight why it&apos;s so hard to hit?&lt;/p&gt;</comment>
                            <comment id="45605" author="doug" created="Wed, 26 Sep 2012 17:17:15 +0000"  >&lt;p&gt;I think this could be a problem with ARM and MIPS but not X86 (which is why it has not caused a problem yet).&lt;/p&gt;

&lt;p&gt;I believe this syncing is only needed if your interface is like PCIe and does not update cache when data is written into the buffer.  However, on X86 data enters coherent domain and by default PCIe packets update cache.  There is a bit in the header to configure this.  I would expect this is on by default.&lt;/p&gt;

&lt;p&gt;The DMA syncing routines seem to be implemented in the kernel for MIPS and ARM so those probably are subjected to this potential problem.  Not sure about PPC.&lt;/p&gt;</comment>
                            <comment id="45617" author="mlizon" created="Wed, 26 Sep 2012 22:24:35 +0000"  >&lt;p&gt;Hi Doug,&lt;/p&gt;

&lt;p&gt;Thanks for the extensive explanation. My search on this topic found that there is an X86 feature called CPU self snoop (&lt;a href=&quot;http://stackoverflow.com/questions/7132284/dma-cache-coherence-management&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://stackoverflow.com/questions/7132284/dma-cache-coherence-management&lt;/a&gt;) and it is configured by default in the x86 Linux kernel.&lt;/p&gt;

&lt;p&gt;Hence, it is a preventive measure. I guess one way to try and hit this issue might be by turning off this bit and recompiling the kernel. However, it&apos;s possible that with this bit being set by default, other modules might not be compliant and other kernel instability might surface before you can even try to run Lustre/Lnet with IB.&lt;/p&gt;</comment>
                            <comment id="45923" author="mlizon" created="Wed, 3 Oct 2012 10:38:32 +0000"  >&lt;p&gt;The Gerrit reference for this bug can be found here:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,4103&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4103&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="48207" author="nrutman" created="Wed, 21 Nov 2012 16:49:26 +0000"  >&lt;p&gt;xyratex-bug-id: &lt;a href=&quot;http://jira-nss.xy01.xyratex.com:8080/browse/MRP-559&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;MRP-559&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv3vb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4155</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>