<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:49:59 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5265] Lustre clients hang while OI_Scrub is running</title>
                <link>https://jira.whamcloud.com/browse/LU-5265</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Context:&lt;br/&gt;
OI_Scrub has been triggered after failover of the MDT on the failover MDS. (related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4554&quot; title=&quot;OI scrub always runs on ldiskfs MDS start up&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4554&quot;&gt;&lt;del&gt;LU-4554&lt;/del&gt;&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;---&lt;del&gt;8&amp;lt;&lt;/del&gt;---&lt;br/&gt;
LustreError: 0-0: ptmp2-MDT0000: trigger OI scrub by RPC for &lt;span class=&quot;error&quot;&gt;&amp;#91;0x22cb1aa25:0xfabf:0x0&amp;#93;&lt;/span&gt;, rc = 0 &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;&lt;br/&gt;
LustreError: 0-0: spool2-MDT0000: trigger OI scrub by RPC for &lt;span class=&quot;error&quot;&gt;&amp;#91;0x20cf1887f:0x92c:0x0&amp;#93;&lt;/span&gt;, rc = 0 &lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;&lt;br/&gt;
---&lt;del&gt;8&amp;lt;&lt;/del&gt;---&lt;/p&gt;

&lt;p&gt;Issue:&lt;br/&gt;
Lustre clients were hung while trying to read/write from/to the filesystem, getting an error EINPROGRESS from the server for each request until the completion of the OI_Scrub process.&lt;/p&gt;

&lt;p&gt;However, the following commands were still working: ls, cd, df&lt;/p&gt;

&lt;p&gt;Due to the number of inodes, the OI_Scrub took 3 hours to complete, hanging the production.&lt;/p&gt;

&lt;p&gt;OI_Scrub status once completed:&lt;br/&gt;
---&lt;del&gt;8&amp;lt;&lt;/del&gt;---&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;cat /proc/fs/lustre/osd-ldiskfs/ptmp2-MDT0000/oi_scrub&lt;br/&gt;
name: OI_scrub&lt;br/&gt;
magic: 0x4c5fd252&lt;br/&gt;
oi_files: 1&lt;br/&gt;
status: completed&lt;br/&gt;
flags:&lt;br/&gt;
param:&lt;br/&gt;
time_since_last_completed: 382 seconds&lt;br/&gt;
time_since_latest_start: 11068 seconds&lt;br/&gt;
time_since_last_checkpoint: 382 seconds&lt;br/&gt;
latest_start_position: 12&lt;br/&gt;
last_checkpoint_position: 499122177&lt;br/&gt;
first_failure_position: N/A&lt;br/&gt;
checked: 190095126&lt;br/&gt;
updated: 2&lt;br/&gt;
failed: 0&lt;br/&gt;
prior_updated: 0&lt;br/&gt;
noscrub: 1965&lt;br/&gt;
igif: 239&lt;br/&gt;
success_count: 3&lt;br/&gt;
run_time: 10685 seconds&lt;br/&gt;
average_speed: 17790 objects/sec&lt;br/&gt;
real-time_speed: N/A&lt;br/&gt;
current_position: N/A&lt;br/&gt;
---&lt;del&gt;8&amp;lt;&lt;/del&gt;---&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;run_time/3600 = 10685/3600 ~= 2.97 hours.&lt;/p&gt;

&lt;p&gt;As a workaround, auto_scrub has been disabled (echo 0 &amp;gt; /proc/fs/lustre/osd-ldiskfs/ptmp2-MDT0000/auto_scrub)&lt;/p&gt;

&lt;p&gt;We have since upgraded to Lustre 2.4.3 with the patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4554&quot; title=&quot;OI scrub always runs on ldiskfs MDS start up&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4554&quot;&gt;&lt;del&gt;LU-4554&lt;/del&gt;&lt;/a&gt;. The customer would like to enable the auto_scrub feature in order to get a consistent OI table, but cannot accept such an impact on the production systems.&lt;/p&gt;

&lt;p&gt;Regarding the &quot;OI Scrub and inode Iterator Solution Architecture&quot;, client can access the MDT while OI Scrub is running. Except the operations of FID-to-path or accessing parent from non-directory child, other operations behave as normal.&lt;/p&gt;</description>
                <environment>RHEL6</environment>
        <key id="25343">LU-5265</key>
            <summary>Lustre clients hang while OI_Scrub is running</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="bruno.travouillon">Bruno Travouillon</reporter>
                        <labels>
                    </labels>
                <created>Fri, 27 Jun 2014 15:00:11 +0000</created>
                <updated>Wed, 27 Aug 2014 17:23:26 +0000</updated>
                            <resolved>Wed, 27 Aug 2014 17:23:26 +0000</resolved>
                                    <version>Lustre 2.4.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="87692" author="bruno.travouillon" created="Fri, 27 Jun 2014 15:02:58 +0000"  >&lt;p&gt;Top on the MDS while OI_Scrub was running&lt;/p&gt;</comment>
                            <comment id="87730" author="pjones" created="Fri, 27 Jun 2014 19:46:31 +0000"  >&lt;p&gt;Fan Yong&lt;/p&gt;

&lt;p&gt;Could you please advise on this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="87764" author="yong.fan" created="Mon, 30 Jun 2014 00:59:48 +0000"  >&lt;p&gt;During the OI scrub rebuilding the OI files, if the client accesses the system with name-based RPC, such as lookup, then it will not be affected. But if the client sends FID-based RPC to the MDS and related FID mapping has not been rebuilt yet, it will get -EINPROGRESS until related FID mapping has been rebuilt, the worst case is that the application has to wait until OI scrub finished. The case of FID-based RPC usually happens for old connected client that caches the FID on client-side before the upgrading or before the MDS file-level backup/restore. For the new connected client, the FID-based RPC will always be after name-based RPC (except for FID-to-path), so the new connected client will not be affected.&lt;/p&gt;

&lt;p&gt;So for your above case, it is normal. Since your system has already run OI scrub, the inconsistent cases should have been fixed already. So even though you enable the &quot;auto_scrub&quot;, the OI scrub should not be triggered unless it finds some new inconsistency (very rare). On the other hand, even though the OI scrub is rebuilding the OI files, it is NOT all the FIDs will be affected. Means that if the application tries to open-read/write the file which FID is not cached on the client or its FID mapping has been rebuilt already, then the application should not be affected by the OI scrub.&lt;/p&gt;

&lt;p&gt;So please tell me whether your system often hits the OI mapping failures (and trigger OI scrub) or not. If not, then enable &quot;auto_scrub&quot; will be OK. Otherwise, means the OI scrub cannot build the OI files completely, there should be other hidden bugs.&lt;/p&gt;</comment>
                            <comment id="88574" author="bruno.travouillon" created="Wed, 9 Jul 2014 07:57:11 +0000"  >&lt;p&gt;Thanks for you clear answer.&lt;/p&gt;

&lt;p&gt;However, can you tell me how to check if OI scrub is triggered while auto_scrub is off?&lt;/p&gt;

&lt;p&gt;In osd_fid_lookup(), the LCONSOLE message &quot;trigger OI scrub by RPC for DFID&quot; only displays when auto_scrub is on.&lt;/p&gt;

&lt;p&gt;Should I check on clients&apos; consoles?&lt;/p&gt;</comment>
                            <comment id="88791" author="yong.fan" created="Fri, 11 Jul 2014 02:35:44 +0000"  >&lt;p&gt;If auto_scrub is disabled, then the OI scrub will NOT be triggered automatically even though some inconsistency is detected during the normal processing. So you can NOT find the message about OI scrub auto running on the MDS. But under such case, the administrator can trigger OI scrub manually via &quot;lctl lfsck_start&quot;.&lt;/p&gt;

&lt;p&gt;The OI scrub is server-side work, in any cases, the client will NOT print any message.&lt;/p&gt;</comment>
                            <comment id="89342" author="bruno.travouillon" created="Thu, 17 Jul 2014 14:46:35 +0000"  >&lt;p&gt;Understood. We should enable the auto_scrub by the beginning of September.&lt;/p&gt;

&lt;p&gt;We will then be able to monitor the OI mapping failures and open a new ticket if we hit some issue.&lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="15272" name="mds_top_oi_scrub" size="2055" author="bruno.travouillon" created="Fri, 27 Jun 2014 15:02:58 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwq3j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>14692</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>