<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:22:43 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2142] &quot;lctl lfsck_start&quot; should start a scrub</title>
                <link>https://jira.whamcloud.com/browse/LU-2142</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Running &quot;lctl lfsck_start -M &lt;/p&gt;
{fsname}
&lt;p&gt;-MDT0000&quot; should start a scrub, unless one is already running.  However, if the scrub was previously run and completed (leaving &lt;tt&gt;last_checkpoint_position == inode_count&lt;/tt&gt;, it appears a new scrub will &lt;em&gt;not&lt;/em&gt; be run because the start position is not reset at the end of the previous lfsck run or the start of the new run:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;latest_start_position: 143392770
last_checkpoint_position: 143392769
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It makes sense to restart the scrub at the last checkpoint position if it didn&apos;t complete for some reason, but if &lt;tt&gt;latest_start_position &amp;gt;= inode_count&lt;/tt&gt; then the start position should be reset to start again.  Both Cliff and I were confused by the current behaviour, and it took us a while to determine that &quot;-r&quot; was needed, and I expect that most users will have the same problem.  The &quot;-r&quot; option should only be needed in case the admin has to handle some unusual condition where a previous scrub was interrupted, but a new full scrub is desired.&lt;/p&gt;</description>
                <environment></environment>
        <key id="16319">LU-2142</key>
            <summary>&quot;lctl lfsck_start&quot; should start a scrub</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Wed, 10 Oct 2012 22:03:16 +0000</created>
                <updated>Mon, 22 Oct 2012 16:51:50 +0000</updated>
                            <resolved>Fri, 19 Oct 2012 02:19:34 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.3.0</fixVersion>
                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="46359" author="yong.fan" created="Wed, 10 Oct 2012 23:19:38 +0000"  >&lt;p&gt;If the OI scrub scanning policy is adjusted as above, we need consider more. For example:&lt;/p&gt;

&lt;p&gt;The last OI scrub scanning finished at the ino# 100&apos;000. And then some new file is created, its ino# may be larger than the last OI scrub finished position, such as 100&apos;001, it also may reuse some deleted inode, so the ino# may be smaller than the last OI scrub finished position, such as 50,001. Under such case, if the system admin run OI scrub again, it will cause different OI scrub behavior: for former one, it is continue scan from 100&apos;000, and finished at the ino# 100&apos;001; for later one, it will reset the scanning from the device beginning, and re-scan the whole MDT device. So from the sysadmin view, the OI scrub behavior become unpredictable. I do not think it is expected.&lt;/p&gt;

&lt;p&gt;So I suggest to use &quot;-r&quot; explicitly to reset the scanning position. If someone wants to re-run OI scrub before former instance finished, he/she can stop current OI scrub explicitly by &quot;lctl lfsck_stop&quot; firstly, then runs OI scrub again by &quot;lctl lfsck_start -r&quot;. I do not think it is so trouble.&lt;/p&gt;</comment>
                            <comment id="46388" author="adilger" created="Thu, 11 Oct 2012 12:20:46 +0000"  >&lt;p&gt;Patch for b2_3 is &lt;a href=&quot;http://review.whamcloud.com/4252&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4252&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="46419" author="yong.fan" created="Thu, 11 Oct 2012 21:40:41 +0000"  >&lt;p&gt;Patch for master:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#change,4250&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4250&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="46496" author="adilger" created="Fri, 12 Oct 2012 19:52:02 +0000"  >&lt;p&gt;I tested this patch by hand (on master, where it was landed after b2_3 where I assumed it had been tested), but it doesn&apos;t appear to have fixed &lt;tt&gt;lctl lfsck_start&lt;/tt&gt; to actually run a scrub when asked.  It now reports &quot;Started LFSCK&quot; every time:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# lctl lfsck_start -M testfs-MDT0000 -s 4
Started LFSCK on the MDT device testfs-MDT0000.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But it doesn&apos;t actually seem to run a scrub (&lt;tt&gt;-s 4&lt;/tt&gt; to make the scrub slow enough to watch:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lime_since_last_completed: 5 seconds
time_since_latest_start: 5 seconds
time_since_last_checkpoint: 5 seconds
latest_start_position: 50002
last_checkpoint_position: 50001
success_count: 17
run_time: 32 seconds
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It resets the start time, but not latest_start_position or the run time, so the scrub takes zero seconds to &quot;finish&quot; but doesn&apos;t actually do anything.  Running with the &quot;-r&quot; option &lt;em&gt;does&lt;/em&gt; seem to start a full scrub:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;time_since_last_completed: 88 seconds
time_since_latest_start: 10 seconds
time_since_last_checkpoint: 10 seconds
latest_start_position: 11
last_checkpoint_position: N/A
run_time: 10 seconds
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;But I would think that &lt;tt&gt;lctl lfsck_start&lt;/tt&gt; should actually &lt;b&gt;start&lt;/b&gt; a scrub, like the command is called, instead of only doing so if &lt;tt&gt;-r&lt;/tt&gt; is given.  If there is already a scrub running, it should continue to run, but if one is not running a new full scrub should be started...&lt;/p&gt;

&lt;p&gt;Seems the patch isn&apos;t quite working yet.&lt;/p&gt;</comment>
                            <comment id="46765" author="yong.fan" created="Fri, 19 Oct 2012 02:19:34 +0000"  >&lt;p&gt;The issue has been fixed as Andreas suggestion.&lt;/p&gt;</comment>
                            <comment id="46805" author="adilger" created="Sat, 20 Oct 2012 04:26:22 +0000"  >&lt;p&gt;Fan Yong, was there another patch landed? It seemed in my testing that this didn&apos;t actually fix the problem. As previously stated, it appears that LFSCK is starter, but since the starting inode is not reset, them LFSCK immediately exits without doing anything...&lt;/p&gt;

&lt;p&gt;Running &quot;lfsck -r&quot; appears to actually runs check, but not &quot;lfsck&quot; by itself does not appear to start a new scrub. &lt;/p&gt;</comment>
                            <comment id="46822" author="yong.fan" created="Sun, 21 Oct 2012 04:19:21 +0000"  >&lt;p&gt;This is the output from myself test against the latest master branch (top ID I2ff03a611267292d0cd6a465c1eb14023516234b), containing the patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2142&quot; title=&quot;&amp;quot;lctl lfsck_start&amp;quot; should start a scrub&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2142&quot;&gt;&lt;del&gt;LU-2142&lt;/del&gt;&lt;/a&gt; (ID I5b8e9ee51ccbf95ed131b963389c4ecfb92b9035):&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub 
name: OI scrub
magic: 0x4c5fd252
oi_files: 64
status: init
flags:
param:
time_since_last_completed: N/A
time_since_latest_start: N/A
time_since_last_checkpoint: N/A
latest_start_position: N/A
last_checkpoint_position: N/A
first_failure_position: N/A
checked: 0
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 0
success_count: 0
run_time: 0 seconds
average_speed: 0 objects/sec
real-time_speed: N/A
current_position: N/A
[root@RHEL6-nasf-CSW tests]# ../utils/lctl lfsck_start -M lustre-MDT0000
Started LFSCK on the MDT device lustre-MDT0000.
[root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub 
name: OI scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 3 seconds
time_since_latest_start: 3 seconds
time_since_last_checkpoint: 3 seconds
latest_start_position: 11
last_checkpoint_position: 100001
first_failure_position: N/A
checked: 206
updated: 0
failed: 0
prior_updated: 0
noscrub: 38
igif: 168
success_count: 1
run_time: 0 seconds
average_speed: 206 objects/sec
real-time_speed: N/A
current_position: N/A
[root@RHEL6-nasf-CSW tests]# 
[root@RHEL6-nasf-CSW tests]# 
[root@RHEL6-nasf-CSW tests]# 
[root@RHEL6-nasf-CSW tests]# ../utils/lctl lfsck_start -M lustre-MDT0000
Started LFSCK on the MDT device lustre-MDT0000.
[root@RHEL6-nasf-CSW tests]# cat /proc/fs/lustre/osd-ldiskfs/lustre-MDT0000/oi_scrub 
name: OI scrub
magic: 0x4c5fd252
oi_files: 64
status: completed
flags:
param:
time_since_last_completed: 1 seconds
time_since_latest_start: 1 seconds
time_since_last_checkpoint: 1 seconds
latest_start_position: 11
last_checkpoint_position: 100001
first_failure_position: N/A
checked: 206
updated: 0
failed: 0
prior_updated: 0
noscrub: 0
igif: 206
success_count: 2
run_time: 0 seconds
average_speed: 206 objects/sec
real-time_speed: N/A
current_position: N/A
[root@RHEL6-nasf-CSW tests]# 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;As you can see, repeatedly run OI scrub by &quot;lctl lfsck_start&quot; can repeatedly trigger OI scrub as our expectation. The condition to re-trigger OI scrub is: the former OI scrub completed &quot;status: completed&quot;.&lt;/p&gt;

&lt;p&gt;You can judge whether the OI scrub re-triggered by checking the item &quot;checked:&quot;: if it is &quot;0&quot;, then no re-triggered; otherwise, it is re-triggered.&lt;/p&gt;

&lt;p&gt;On the other hand, the OI scrub may skip the new created inodes (only once) since last OI scrub run. So the item &quot;checked:&quot; may be not the same as the real allocated inodes count.&lt;/p&gt;

&lt;p&gt;So Andreas, would you please describe in detail which operations you did to reproduce the issues? Then I can analysis what happened. Thanks!&lt;/p&gt;</comment>
                            <comment id="46848" author="adilger" created="Mon, 22 Oct 2012 16:51:50 +0000"  >&lt;p&gt;Fan Yong, you are correct. This patch fixes the problem. I must have been testing on my system after rebuilding, but not reloading the modules. &lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv9z3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5150</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>