<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:25:55 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16316] ZFS OSS locks  </title>
                <link>https://jira.whamcloud.com/browse/LU-16316</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have experienced locks over the past few weeks on OSS based on ZFS 2.0.7, which makes the node unresponsive in terms of Lustre (OSS node goes &lt;tt&gt;unhealthy&lt;/tt&gt;) and causes a huge load (&amp;gt;400) on OSS. In some situations, directly after that, the load on MDS also increases, but it seems like a consequence of lost communication between MDS and affected OSS. We cannot associate this problem with the exact IO pattern or type of operation. We first address this problem here, but we cannot exclude that it should be addressed to ZFS developers - if you consider it, please let us know.&#160; We attach two types of logs: the first from the 16th of October when both MDS and OSS were affected and the second from the 13th of November when only OSS was stuck. If you need more information, please don&apos;t hesitate to let us know.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Regards&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Dominika Wanat&#160;&lt;/p&gt;</description>
                <environment>Lustre: 2.15.0_RC3, zfs 2.0.7 (both self-compiled)&lt;br/&gt;
OS: Centos 8.5, kernel 4.18.0-348.7.1.el8_5.x86_64</environment>
        <key id="73284">LU-16316</key>
            <summary>ZFS OSS locks  </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="wanat">Dominika Wanat</reporter>
                        <labels>
                    </labels>
                <created>Wed, 16 Nov 2022 13:17:31 +0000</created>
                <updated>Fri, 23 Dec 2022 11:41:56 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="354159" author="bzzz" created="Fri, 25 Nov 2022 08:07:30 +0000"  >&lt;blockquote&gt;&lt;p&gt;In some situations, directly after that, the load on MDS also increases, but it seems like a consequence of lost communication between MDS and affected OSS.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;correct, this is because MDS gets stuck awaiting for new objects from the problem OST.&lt;/p&gt;

&lt;p&gt;I&apos;m not 100% positive, but I found number of OST threads trying to prefetch data. you can try to disable prefetching to see whether it&apos;s related:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;echo 1 &amp;gt; /sys/module/zfs/parameters/zfs_prefetch_disable&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt; &amp;#8211; on OSTs&lt;/p&gt;</comment>
                            <comment id="354288" author="JIRAUSER18015" created="Mon, 28 Nov 2022 09:53:11 +0000"  >&lt;p&gt;Thanks for the hint. We are investigating the nodes with prefetch disabled.&#160;&lt;/p&gt;</comment>
                            <comment id="356963" author="JIRAUSER18015" created="Tue, 20 Dec 2022 13:19:25 +0000"  >&lt;p&gt;It looks like it helps - nodes have not hung since then. Do you consider fixing this behaviour of Lustre with ZFS prefetch enabled?&lt;/p&gt;</comment>
                            <comment id="357316" author="bzzz" created="Fri, 23 Dec 2022 11:41:56 +0000"  >&lt;blockquote&gt;&lt;p&gt;It looks like it helps - nodes have not hung since then. Do you consider fixing this behaviour of Lustre with ZFS prefetch enabled?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;this is very workload specific thing.. we&apos;ve seen number of reports where prefetching &lt;em&gt;does&lt;/em&gt; improve performance.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="46693" name="mds01_20221016.log" size="40736" author="wanat" created="Wed, 16 Nov 2022 13:05:09 +0000"/>
                            <attachment id="46692" name="oss03_20221113.log" size="158102" author="wanat" created="Wed, 16 Nov 2022 13:05:10 +0000"/>
                            <attachment id="46691" name="oss06_20221016.log" size="132120" author="wanat" created="Wed, 16 Nov 2022 13:05:10 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i035vj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>