<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:18:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8562] osp_precreate_cleanup_orphans/osp_precreate_reserve race may cause data loss</title>
                <link>https://jira.whamcloud.com/browse/LU-8562</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;osp_statfs_interpret can clear error in opd_pre_status despite of the&lt;br/&gt;
fact that osp_precreate_cleanup_orphans got error and doesn&apos;t know&lt;br/&gt;
exactly OST object last_id. Example: &lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;mdt sends req &quot;create objects x..y&quot;&lt;/li&gt;
	&lt;li&gt;objects created. mdt gets OK&lt;/li&gt;
	&lt;li&gt;MDT-&amp;gt;OST reconnection&lt;/li&gt;
	&lt;li&gt;MDT sends cleanup_orphans last_used_fid=x&lt;/li&gt;
	&lt;li&gt;OST removes x..y and sends reply OK and last_id=x&lt;/li&gt;
	&lt;li&gt;MDT-&amp;gt;OST connection aborted. cleanup_orphans exits with EIO&lt;/li&gt;
	&lt;li&gt;osp_statfs_interpret changes opd_pre_status from EIO to 0&lt;/li&gt;
	&lt;li&gt;osp_precreate_reserve reserves object and changes last_used_id from x to x+1&lt;/li&gt;
	&lt;li&gt;connection restored. MDT sends cleanup_orphans last_id=x+1&lt;br/&gt;
In fine OST has a gap - object x was removed by cleanup_orphans.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Below is reproducer that works only on singe node setup:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;diff --git a/lustre/tests/conf-sanity.sh b/lustre/tests/conf-sanity.sh
index c64ebab..f5026dc 100755
--- a/lustre/tests/conf-sanity.sh
+++ b/lustre/tests/conf-sanity.sh
@@ -6796,6 +6796,32 @@ test_97() {
 }
 run_test 97 &quot;ldev returns correct ouput when querying based on role&quot;
 
+test_98() {
+       local_mode || { skip &quot;Need single node setup&quot;; return; }
+       local cmp=0
+       local dev=$FSNAME-OST0000-osc-MDT0000
+       setupall
+
+       createmany -o $DIR1/$tfile-%d 50000&amp;amp;
+       cmp=$!
+       # MDT-&amp;gt;OST reconnection causes MDT&amp;lt;-&amp;gt;OST last_id synchornisation
+       # via osp_precreate_cleanup_orphans.
+       for i in $(seq 0 100); do
+               for k in $(seq 0 10); do
+                       $LCTL --device $dev deactivate
+                       $LCTL --device $dev activate
+               done
+               ls -asl $MOUNT | grep &apos;???&apos; &amp;amp;&amp;amp; \
+                       (kill -9 $cmp &amp;amp;&amp;gt;/dev/null; \
+                       error &quot;File hasn&apos;t object on OST&quot;)
+               ps -A -o pid | grep $cmp 1&amp;gt;/dev/null || break
+       done
+       wait $cmp
+       stopall
+}
+run_test 98 &quot;Race MDT-&amp;gt;OST reconnection with create&quot;
+
+
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="39178">LU-8562</key>
            <summary>osp_precreate_cleanup_orphans/osp_precreate_reserve race may cause data loss</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="scherementsev">Sergey Cheremencev</reporter>
                        <labels>
                    </labels>
                <created>Mon, 29 Aug 2016 13:01:30 +0000</created>
                <updated>Tue, 16 Jul 2019 17:18:11 +0000</updated>
                            <resolved>Thu, 16 Feb 2017 21:44:05 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                    <version>Lustre 2.9.0</version>
                                    <fixVersion>Lustre 2.10.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="163743" author="sergey" created="Wed, 31 Aug 2016 18:16:24 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/22211/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/22211/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="169085" author="sergey" created="Tue, 11 Oct 2016 11:28:16 +0000"  >&lt;p&gt;We observed that patch needs to be changed.&lt;br/&gt;
New version is under review in seagate now.&lt;br/&gt;
When review will be completed I&apos;ll update the patch and will add a test with reproducer.&lt;/p&gt;</comment>
                            <comment id="178911" author="gerrit" created="Fri, 23 Dec 2016 05:04:50 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/22211/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/22211/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8562&quot; title=&quot;osp_precreate_cleanup_orphans/osp_precreate_reserve race may cause data loss&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8562&quot;&gt;&lt;del&gt;LU-8562&lt;/del&gt;&lt;/a&gt; osp: fix precreate_cleanup_orphans/precreate_reserve race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: d295847d946276ab7ebae7811498fbdb1289e6e7&lt;/p&gt;</comment>
                            <comment id="178941" author="pjones" created="Fri, 23 Dec 2016 13:38:19 +0000"  >&lt;p&gt;Landed for 2.10&lt;/p&gt;</comment>
                            <comment id="179012" author="tappro" created="Sat, 24 Dec 2016 21:32:24 +0000"  >&lt;p&gt;reopen due to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8972&quot; title=&quot;conf-sanity test_101: File hasn&amp;#39;t object on OST&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8972&quot;&gt;&lt;del&gt;LU-8972&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="179087" author="nedbass" created="Wed, 28 Dec 2016 00:40:27 +0000"  >&lt;p&gt;Why was this not a blocker for 2.9?&lt;/p&gt;</comment>
                            <comment id="179214" author="nedbass" created="Fri, 30 Dec 2016 03:22:36 +0000"  >&lt;p&gt;I was testing out patch 22211 and (if my understanding is correct) may have found a defect.&lt;/p&gt;

&lt;p&gt;It seems &lt;tt&gt;osp_precreate_thread()&lt;/tt&gt; can get&#160;stuck because &lt;tt&gt;d-&amp;gt;opd_got_disconnected&lt;/tt&gt; never gets reset. When &lt;tt&gt;opd_got_disconnected&lt;/tt&gt; is set, &lt;tt&gt;osp_precreate_cleanup_orphans()&lt;/tt&gt;&#160;returns early with EAGAIN and can&apos;t&#160;clear &lt;tt&gt;d-&amp;gt;opd_pre_recovering&lt;/tt&gt;. And because &lt;tt&gt;d-&amp;gt;opd_pre_recovering&lt;/tt&gt;&#160;can&apos;t be cleared we always hit the break statement below and don&apos;t&#160;clear d-&amp;gt;opd_got_disconnected. So &lt;tt&gt;osp_precreate_cleanup_orphans()&lt;/tt&gt; is stuck always failing.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        while (osp_precreate_running(d)) {
                /*
                 * need to be connected to OST
                 */
                while (osp_precreate_running(d)) {
+                       if (d-&amp;gt;opd_pre_recovering &amp;amp;&amp;amp;
+                           d-&amp;gt;opd_imp_connected)
+                               break;
                        l_wait_event(d-&amp;gt;opd_pre_waitq,
                                     !osp_precreate_running(d) ||
                                     d-&amp;gt;opd_new_connection,
                                     &amp;amp;lwi);
 
                        if (!d-&amp;gt;opd_new_connection)
                                continue;
 
                        d-&amp;gt;opd_new_connection = 0;
                        d-&amp;gt;opd_got_disconnected = 0;
                        break;
                }
 
                if (!osp_precreate_running(d))
                        break;
 
                LASSERT(d-&amp;gt;opd_obd-&amp;gt;u.cli.cl_seq != NULL);
                /* Sigh, fid client is not ready yet */
                if (d-&amp;gt;opd_obd-&amp;gt;u.cli.cl_seq-&amp;gt;lcs_exp == NULL)
                        continue;
 
                /* Init fid for osp_precreate if necessary */
                rc = osp_init_pre_fid(d);
                if (rc != 0) {
                        class_export_put(d-&amp;gt;opd_exp);
                        d-&amp;gt;opd_obd-&amp;gt;u.cli.cl_seq-&amp;gt;lcs_exp = NULL;
                        CERROR(&quot;%s: init pre fid error: rc = %d\n&quot;,
                               d-&amp;gt;opd_obd-&amp;gt;obd_name, rc);
                        continue;
                }
 
                osp_statfs_update(d);
 
                /*
                 * Clean up orphans or recreate missing objects.
                 */
                rc = osp_precreate_cleanup_orphans(&amp;amp;env, d);
-               if (rc != 0)
+               if (rc != 0) {
+                       schedule_timeout_interruptible(cfs_time_seconds(1));
                        continue;
+               }
                /*
                 * connected, can handle precreates now
                 */

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="179960" author="gerrit" created="Sat, 7 Jan 2017 01:46:27 +0000"  >&lt;p&gt;Ned Bass (bass6@llnl.gov) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/24758&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/24758&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8562&quot; title=&quot;osp_precreate_cleanup_orphans/osp_precreate_reserve race may cause data loss&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8562&quot;&gt;&lt;del&gt;LU-8562&lt;/del&gt;&lt;/a&gt; osp: osp_precreate_thread gets stuck after disconnect&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: b0eae5b52842c32c63ed6ba3e8981a84cede7c94&lt;/p&gt;</comment>
                            <comment id="181846" author="gerrit" created="Tue, 24 Jan 2017 05:21:48 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/24758/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/24758/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8562&quot; title=&quot;osp_precreate_cleanup_orphans/osp_precreate_reserve race may cause data loss&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8562&quot;&gt;&lt;del&gt;LU-8562&lt;/del&gt;&lt;/a&gt; osp: osp_precreate_thread gets stuck after disconnect&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: fb64c701791e591f4fd1a849e4be774ff85145fc&lt;/p&gt;</comment>
                            <comment id="185193" author="mdiep" created="Thu, 16 Feb 2017 21:44:05 +0000"  >&lt;p&gt;Landed in Lustre 2.10&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="42639">LU-8967</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is duplicated by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="42650">LU-8972</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="52861">LU-11196</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzymdz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>