<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:47:26 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11844] IO pattern causing writes to hang to OST</title>
                <link>https://jira.whamcloud.com/browse/LU-11844</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have a fortran code that does small writes to a status file. This code worked well on Lustre 2.5.x. On Lustre 2.10.4 we&#8217;re experiencing a bug with the same code where one of the writes will hang for about 20 seconds. During that period all writes to the affect OST will hang. Io to other OSTs will work, and io from other nodes are unaffected.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The strace of the application shows that it opens a file, truncates to 0, writes a char, tuncates to 1, and continues writing,closes then repeats. After a few cycles the write that passes 4k offset into the file will hang for about 30 seconds.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The strace looks like this:&lt;/p&gt;

&lt;p&gt;&#8230;&lt;/p&gt;

&lt;p&gt;open(&quot;short_file&quot;, O_RDWR|O_CREAT, 0700) = 3&lt;/p&gt;

&lt;p&gt;ftruncate(3, 0) = 0&lt;/p&gt;

&lt;p&gt;write(3, &quot;/&quot;, 1) = 1&lt;/p&gt;

&lt;p&gt;ftruncate(3, 1) = 0&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130) = 130&lt;/p&gt;

&lt;p&gt;write(3, &quot;atHUpw+orbPHuU55Em+XvliyYOwQg2le&quot;..., 130 &#8592; Hangs Here&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;The Lustre debug traces from the client shows the hang and indicates multiple calls to genops.c:1990:obd_stale_export_get. Attached are a reproducer, and the lctl dk output after issuing: echo &quot;trace nettrace dlmtrace rpctrace vfstrace&quot; &amp;gt; /proc/sys/lnet/debug&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="54462">LU-11844</key>
            <summary>IO pattern causing writes to hang to OST</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="apargal">Alex Parga</reporter>
                        <labels>
                    </labels>
                <created>Wed, 9 Jan 2019 14:31:49 +0000</created>
                <updated>Wed, 9 Jan 2019 20:36:14 +0000</updated>
                                            <version>Lustre 2.10.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="239596" author="bzzz" created="Wed, 9 Jan 2019 14:57:06 +0000"  >&lt;p&gt;do you use ldiskfs or ZFS on OST? how much free space OSTs report?&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="239599" author="apargal" created="Wed, 9 Jan 2019 15:16:44 +0000"  >&lt;p&gt;We use zfs 0.7.9-1. We can replicate the problem on all of our Lustre2.10.4 filesystems that are freshly formated and up to 35% utilized.&lt;/p&gt;</comment>
                            <comment id="239600" author="bzzz" created="Wed, 9 Jan 2019 15:18:28 +0000"  >&lt;p&gt;the symptoms look like&#160;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11798&quot; title=&quot;cur_grant goes to 0 and never increases with 2.8 client and 2.10 server&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11798&quot;&gt;&lt;del&gt;LU-11798&lt;/del&gt;&lt;/a&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="239621" author="adilger" created="Wed, 9 Jan 2019 18:42:56 +0000"  >&lt;p&gt;Alex Z, why did you think &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11798&quot; title=&quot;cur_grant goes to 0 and never increases with 2.8 client and 2.10 server&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11798&quot;&gt;&lt;del&gt;LU-11798&lt;/del&gt;&lt;/a&gt;?  I thought that was due to some incompatibility between 2.8 clients and 2.10 servers?&lt;/p&gt;

&lt;p&gt;Alex P, does this issue happen immediately on a newly mounted client, or does it take some time before it hits?  On a client that is having the problem, can you check &quot;&lt;tt&gt;lctl get_param osc.&amp;#42;.cur_grant_bytes&lt;/tt&gt;&quot; to see if the client is running out of grant?&lt;/p&gt;</comment>
                            <comment id="239634" author="apargal" created="Wed, 9 Jan 2019 20:36:14 +0000"  >&lt;p&gt;It does happen on a freshly mounted client. Just to confirm we are using Lustre-2.10.4 on the clients and servers. We did a test based &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11798&quot; title=&quot;cur_grant goes to 0 and never increases with 2.8 client and 2.10 server&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11798&quot;&gt;&lt;del&gt;LU-11798&lt;/del&gt;&lt;/a&gt; comments and set the recordsize back to 128K, and then ran the reproducer while monitoring the cur_grant_bytes. The problem still occurred however with each hang the cur_grant_bytes&#160; would cycle in order between 262144, 524288, 0 and then repeat.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="54327">LU-11798</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="31742" name="dk_client.txt" size="5797545" author="apargal" created="Wed, 9 Jan 2019 14:31:25 +0000"/>
                            <attachment id="31741" name="dk_mds.txt" size="4281251" author="apargal" created="Wed, 9 Jan 2019 14:31:25 +0000"/>
                            <attachment id="31740" name="dk_oss.txt" size="2385893" author="apargal" created="Wed, 9 Jan 2019 14:31:25 +0000"/>
                            <attachment id="31739" name="repro.c" size="574" author="apargal" created="Wed, 9 Jan 2019 14:31:39 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0091z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>