<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:35:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-17412] lustre snapshot: write barrier stuck at &quot;failed&quot; state</title>
                <link>https://jira.whamcloud.com/browse/LU-17412</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;When we create a lustre snapshot (lctl snapshot_create) a global write barrier is used internally to avoid an inconsistent snapshot of the filesystem.&lt;/p&gt;

&lt;p&gt;Creating a snapshot after mounting another snapshot causes the barrier to get into a &quot;failed&quot; state. This state cannot be cleared until an MGS remount an actual filesystem. Any operations involving barrier fails due to this&amp;lt;lctl snapshot_{create, destroy} &amp;gt;, etc.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
# lctl barrier_stat testfs
state: failed
timeout: 0 seconds

# lctl barrier_rescan testfs
Fail to rescan barrier bitmap &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; testfs: Invalid argument
# lctl barrier_thaw testfs
Fail to thaw barrier &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; testfs: Invalid argument&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Bisecting through master branch recent commits. The below commit was causing the issue and I&apos;m not able to reproduce this issue without this commit.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
  &#160; LU-17142 mgc: reconnection without pinger&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Overall looks like the issue is due to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17142&quot; title=&quot;MGC long time connection&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17142&quot;&gt;&lt;del&gt;LU-17142&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I&apos;m able to consistently reproduce this using this script(&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/attachment/51617/51617_reproducer.sh&quot; title=&quot;reproducer.sh attached to LU-17412&quot;&gt;reproducer.sh&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.whamcloud.com/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;).&lt;/p&gt;</description>
                <environment></environment>
        <key id="79927">LU-17412</key>
            <summary>lustre snapshot: write barrier stuck at &quot;failed&quot; state</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="akash-b">Akash B</reporter>
                        <labels>
                            <label>ZFS</label>
                            <label>snapshots</label>
                    </labels>
                <created>Wed, 10 Jan 2024 16:05:02 +0000</created>
                <updated>Thu, 11 Jan 2024 09:06:09 +0000</updated>
                                            <version>Upstream</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="399176" author="adilger" created="Wed, 10 Jan 2024 16:48:36 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=aboyko&quot; class=&quot;user-hover&quot; rel=&quot;aboyko&quot;&gt;aboyko&lt;/a&gt;, any ideas here?&lt;/p&gt;</comment>
                            <comment id="399255" author="aboyko" created="Thu, 11 Jan 2024 09:06:09 +0000"  >&lt;p&gt;The situation is next&lt;br/&gt;
lctl snapshot_mount xxx - starts the MGC reconnection, this leads to a dropping locks for MGS.&lt;br/&gt;
lctl snapshot_create xxx -&amp;gt; lctl barrier_freeze -&amp;gt; MGS sends glimpse AST to MGCs, no locks, considers as error.&lt;br/&gt;
Clients need some time to enqueue MGS locks.&lt;br/&gt;
The simple 3-5 seconds timeout between snapshot_mount and snapshot_create helps.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="78083">LU-17142</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="51617" name="reproducer.sh" size="459" author="akash-b" created="Wed, 10 Jan 2024 16:03:59 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0472f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>