<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:43:46 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4554] OI scrub always runs on ldiskfs MDS start up</title>
                <link>https://jira.whamcloud.com/browse/LU-4554</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We are running Lustre 2.4.0-21chaos (see &lt;a href=&quot;http://github/com/chaos/lustre&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://github/com/chaos/lustre&lt;/a&gt;), and most likely of particular interest are these two patches that we are carrying:&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3934&quot; title=&quot;Directories gone missing after 2.4 update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3934&quot;&gt;&lt;del&gt;LU-3934&lt;/del&gt;&lt;/a&gt; scrub: detect upgraded from 1.8 correctly&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3420&quot; title=&quot;OI scrubbing could not automatically engage after restoring a secondary MDT from a (file-level) backup&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3420&quot;&gt;&lt;del&gt;LU-3420&lt;/del&gt;&lt;/a&gt; scrub: trigger OI scrub properly&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We now find that, at least on the ldiskfs MDS nodes, OI scrub runs on &lt;em&gt;every&lt;/em&gt; start up of the MDS.  The console message looks something like this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2014-01-28 09:27:28 sumom-mds1 login: LustreError: 0-0: lsc-MDT0000: trigger OI scrub by RPC for [0x7e4d2310f09:0x2ddf:0x0], rc = 0 [1]&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Given the frequency of MDS reboots (i.e. often) required lately for other bugs, OI scrub is running far too much.&lt;/p&gt;</description>
                <environment></environment>
        <key id="22907">LU-4554</key>
            <summary>OI scrub always runs on ldiskfs MDS start up</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jamesanunez">James Nunez</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>mn4</label>
                    </labels>
                <created>Tue, 28 Jan 2014 18:40:56 +0000</created>
                <updated>Wed, 23 Apr 2014 15:08:34 +0000</updated>
                            <resolved>Tue, 11 Feb 2014 22:21:50 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                                    <fixVersion>Lustre 2.6.0</fixVersion>
                    <fixVersion>Lustre 2.5.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>11</watches>
                                                                            <comments>
                            <comment id="75791" author="pjones" created="Tue, 28 Jan 2014 19:02:53 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please consult with Fan Yong and respond on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="75817" author="morrone" created="Tue, 28 Jan 2014 23:21:58 +0000"  >&lt;p&gt;We need to increase the priority on this ticket.  The OI scrub is introducing a lot of production problems for us.&lt;/p&gt;

&lt;p&gt;First of all, while OI scrub is running, many client nodes hang while holding the mdc lock.  Apparently they are doing an operation that triggers osd_fid_lookup() on the server.  The FID lookup fails and the server decides to start OI scrub (or OI scrub is already running) and returns EINPROGRESS to the client.&lt;/p&gt;

&lt;p&gt;Unfortunately, the client is now pretty much unusable to anyone until OI scrub complete.&lt;/p&gt;

&lt;p&gt;Furthermore, it would appear that OI scrub will not respond to the stop command.  I&apos;m told that running &quot;lctl lfsck_stop -M ls5-MDT0000&quot; returns &quot;operation already in progress&quot;.&lt;/p&gt;</comment>
                            <comment id="75826" author="nedbass" created="Wed, 29 Jan 2014 01:45:14 +0000"  >&lt;p&gt;I&apos;ll note that in the &lt;tt&gt;oi_scrub&lt;/tt&gt; file, the &lt;tt&gt;Updated&lt;/tt&gt; count is 0 and the &lt;tt&gt;Success&lt;/tt&gt; count is 5.  From the current scan position relative to number of inodes used, I think it is nearing completion.  So it seems the OI scrub is not finding anything to repair, yet certain client requests trigger the scrub.  From server-side debug logs, it seems that the trigger is happening here in &lt;tt&gt;osd_fid_lookup()&lt;/tt&gt;: &lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt; 383         &lt;span class=&quot;code-comment&quot;&gt;/* Search order: 3. OI files. */&lt;/span&gt;                                        
 384         result = osd_oi_lookup(info, dev, fid, id, &lt;span class=&quot;code-keyword&quot;&gt;true&lt;/span&gt;);                       
 385         &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (result == -ENOENT) {                                                
 386                 &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!fid_is_norm(fid) || fid_is_on_ost(info, dev, fid) ||       
 387                     !ldiskfs_test_bit(osd_oi_fid2idx(dev,fid),                  
 388                                       sf-&amp;gt;sf_oi_bitmap))                        
 389                         GOTO(out, result = 0);                                  
 390                                                                                 
 391                 &lt;span class=&quot;code-keyword&quot;&gt;goto&lt;/span&gt; trigger;                                                   
 392         }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are no debug log entries from &lt;tt&gt;osd_iget()&lt;/tt&gt; so it must be jumping past the &lt;tt&gt;iget:&lt;/tt&gt; goto label, which means it must take the path above.&lt;/p&gt;</comment>
                            <comment id="75831" author="yong.fan" created="Wed, 29 Jan 2014 03:12:30 +0000"  >&lt;p&gt;The FID &lt;span class=&quot;error&quot;&gt;&amp;#91;0x7e4d2310f09:0x2ddf:0x0&amp;#93;&lt;/span&gt; is an normal FID, according to above logic, the condition:&lt;/p&gt;

&lt;p&gt;&quot;(ldiskfs_test_bit(osd_oi_fid2idx(dev,fid), sf-&amp;gt;sf_oi_bitmap))&quot; should true.&lt;/p&gt;

&lt;p&gt;That means the OSD thinks that it is in OI files re-creating, so it is not sure whether it is because the object does not non-exist or because related OI mapping is not re-inserted yet.&lt;/p&gt;

&lt;p&gt;So would you please to show the OI_scrub file under /proc to check whether the system is really in re-creating the OI files or not? Thanks!&lt;/p&gt;</comment>
                            <comment id="75833" author="morrone" created="Wed, 29 Jan 2014 03:41:58 +0000"  >&lt;p&gt;Here is the oi_scrub output from one of the MDS nodes that was rebooted on our open network this morning:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;&amp;gt; cat osd-ldiskfs/lsc-MDT0000/oi_scrub
name: OI_scrub
magic: 0x4c5fd252
oi_files: 1
status: completed
flags:
param:
time_since_last_completed: 28001 seconds
time_since_latest_start: 36680 seconds
time_since_last_checkpoint: 28001 seconds
latest_start_position: 12
last_checkpoint_position: 991133697
first_failure_position: N/A
checked: 170974399
updated: 0
failed: 0
prior_updated: 0
noscrub: 2284
igif: 12381456
success_count: 12
run_time: 8678 seconds
average_speed: 19702 objects/sec
real-time_speed: N/A
current_position: N/A
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="75841" author="yong.fan" created="Wed, 29 Jan 2014 10:17:09 +0000"  >&lt;p&gt;Then it is strange, the code should not jump from such place. Is there any (-1 level) debug log when the OI scrub triggered?&lt;/p&gt;</comment>
                            <comment id="75872" author="nedbass" created="Wed, 29 Jan 2014 17:52:27 +0000"  >&lt;p&gt;Fan Yong, the -1 debug log is from a classified system, so I can&apos;t send it, but if you have specific questions about it I can look for you.  While the scrub was in progress, the flags field only had &apos;auto&apos;.  An example FID from that system that followed the &quot;trigger&quot; path was &lt;span class=&quot;error&quot;&gt;&amp;#91;0x1a89082ad98:0x4d:0x0&amp;#93;&lt;/span&gt;.&lt;/p&gt;</comment>
                            <comment id="75874" author="nedbass" created="Wed, 29 Jan 2014 18:09:25 +0000"  >&lt;p&gt;I also notice &lt;tt&gt;osd_fid_lookup()&lt;/tt&gt; starts the scrub using &lt;tt&gt;osd_scrub_start(dev)&lt;/tt&gt;, which only enables the flag &lt;tt&gt;SS_AUTO&lt;/tt&gt;.  So even though &lt;tt&gt;(ldiskfs_test_bit(osd_oi_fid2idx(dev,fid), sf-&amp;gt;sf_oi_bitmap))&lt;/tt&gt; is true, (unless I misunderstand something) we would not see the &quot;recreated&quot; flag in  &lt;tt&gt;oi_scrub&lt;/tt&gt;.&lt;/p&gt;</comment>
                            <comment id="75945" author="nedbass" created="Thu, 30 Jan 2014 21:42:56 +0000"  >&lt;p&gt;I peeked at the /OI_scrub file while an auto-scrub was running.  It showed that bit 0 was set in sf-&amp;gt;sf_oi_bitmap.  This is wrong, because the OI already exists and OI_scrub has already run to completion several times.&lt;/p&gt;

&lt;p&gt;I think I see the problem in &lt;a href=&quot;https://github.com/chaos/lustre/blob/2.4.0-19chaos/lustre/osd-ldiskfs/osd_oi.c#L299&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;&lt;tt&gt;osd_oi_table_open()&lt;/tt&gt;&lt;/a&gt;. Note the format string assumes the OI containers have names like oi.16.0, oi.16.1, and so on.  However, for our upgraded filesystems we have only one OI container named oi.16.  So &lt;tt&gt;osd_oi_open()&lt;/tt&gt; returns &lt;tt&gt;ENOENT&lt;/tt&gt; and we proceed to set the &quot;recreated&quot; bit in the bitmap.&lt;/p&gt;</comment>
                            <comment id="75952" author="nedbass" created="Thu, 30 Jan 2014 23:15:06 +0000"  >&lt;p&gt;Patch for master: &lt;a href=&quot;http://review.whamcloud.com/#/c/9067/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9067/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="76078" author="adilger" created="Mon, 3 Feb 2014 17:15:33 +0000"  >&lt;p&gt;James, Lai is on holiday this week. Could you please cherry-pick this patch to b2_4 and b2_5 once it has landed to master. This can now be done directly in Gerrit. Please also add the &quot;Lustre-change:&quot; and &quot;Lustre-commit:&quot; tags to the commit messages as described on &lt;a href=&quot;https://wiki.hpdd.intel.com/display/PUB/Commit+Comments&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://wiki.hpdd.intel.com/display/PUB/Commit+Comments&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="76280" author="jamesanunez" created="Wed, 5 Feb 2014 17:33:41 +0000"  >&lt;p&gt;This patch hasn&apos;t landed to master yet, but I created a b2_5 and b2_4 patch at:&lt;/p&gt;

&lt;p&gt;b2_4 - &lt;a href=&quot;http://review.whamcloud.com/#/c/9140/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9140/&lt;/a&gt;&lt;br/&gt;
b2_5 - &lt;a href=&quot;http://review.whamcloud.com/#/c/9139/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9139/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="76784" author="pjones" created="Tue, 11 Feb 2014 22:21:50 +0000"  >&lt;p&gt;Landed for 2.5.1 and 2.6&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwdsf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12442</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>