<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:00:08 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13304] add testing for LFSCK DRYRUN</title>
                <link>https://jira.whamcloud.com/browse/LU-13304</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I noticed in a patch review that there are places in &lt;tt&gt;lfsck_layout.c&lt;/tt&gt; (and probably others) where &lt;tt&gt;lb_param &amp;amp; LPF_DRYRUN&lt;/tt&gt; is not being checked before modifying the filesystem. For example, &lt;tt&gt;lfsck_layout_ins_dangling_rec()&lt;/tt&gt;, &lt;tt&gt;lfsck_layout_del_dangling_rec()&lt;/tt&gt;, &lt;tt&gt;lfsck_layout_refill_lovea()&lt;/tt&gt;, &lt;tt&gt;__lfsck_layout_update_pfid()&lt;/tt&gt;, &lt;tt&gt;lfsck_layout_recreate_parent()&lt;/tt&gt;, etc.  While it is possible that those functions are skipped at a higher level when &lt;tt&gt;DRYRUN&lt;/tt&gt; is set, it would be prudent to add checks and skip any filesystem-modifying code at the level where those changes are being done, maybe with a &lt;tt&gt;CDEBUG(D_LFSCK, ...)&lt;/tt&gt; message so that the potential fix appears in the debug log (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9548&quot; title=&quot;No debug info from lctl set_param debug=+lfsck shows up with lfsck dry-run&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9548&quot;&gt;LU-9548&lt;/a&gt;).  That makes it very clear to the reader/developer that there is no chance those modifications would be done.&lt;/p&gt;

&lt;p&gt;We should do a lot to improve the testing of DRYRUN during LFSCK to make sure that this is being handled correctly, and not change the filesystem when it is asked to only do a check.&lt;/p&gt;

&lt;p&gt;In addition to adding checks for DRYRUN in all of the modifying functions (even if this is redundant), some suggestions to detect problems during testing would include:&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;run each test with &quot;lctl lfsck_start --dryrun&quot; first, then verify the problem was not fixed until the second (non-DRYRUN) run&lt;/li&gt;
	&lt;li&gt;set &quot;&lt;tt&gt;lctl printk=+lfsck&lt;/tt&gt;&quot; during sanity-lfsck.sh and grab the D_LFSCK output from all nodes during each subtest, use &quot;sed&quot; to drop the run-unique output like timestamps, PID, FID, then compare the output with &quot;expect&quot; to ensure that the right thing is being done for each subtest&lt;/li&gt;
	&lt;li&gt;add a &lt;tt&gt;lfsck_declare_trans_start()&lt;/tt&gt; wrapper for &lt;tt&gt;dt_declare_trans_start()&lt;/tt&gt; (and/or lfsck_trans_start() for dt_trans_start()) that prints and returns an error if called when DRYRUN is set. That &lt;em&gt;might&lt;/em&gt; be too much, if there are &quot;empty&quot; transactions created when DRYRUN is set, but it wouldn&apos;t be terrible to avoid that completely.&lt;/li&gt;
	&lt;li&gt;if the above is too broken, add &lt;tt&gt;lfsck_&amp;#42;&lt;/tt&gt; wrappers for all of the &lt;tt&gt;dt_&amp;#42;&lt;/tt&gt; modifying operations to return an error if called with DRYRUN set. That would handle the case of &lt;tt&gt;dt_trans_start()&lt;/tt&gt; being called when &lt;tt&gt;--dryrun&lt;/tt&gt; is set, but is &lt;b&gt;not&lt;/b&gt; as good a solution as avoiding it completely, since there is still overhead from calling &lt;tt&gt;dt_trans_start()&lt;/tt&gt; for empty transactions, and more danger for the filesystem if something is missed.&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment></environment>
        <key id="58210">LU-13304</key>
            <summary>add testing for LFSCK DRYRUN</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="adilger">Andreas Dilger</reporter>
                        <labels>
                    </labels>
                <created>Fri, 28 Feb 2020 00:16:39 +0000</created>
                <updated>Wed, 8 Dec 2021 23:12:29 +0000</updated>
                                                                                <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="320353" author="adilger" created="Wed, 8 Dec 2021 23:12:29 +0000"  >&lt;p&gt;I hit a check in exactly this codepath while running &quot;&lt;tt&gt;lfsck -r -A -t all --dryryn&lt;/tt&gt;&quot; on a test filesystem.  The &lt;tt&gt;lfsck_trans_create()&lt;/tt&gt; wrapper was added in patch &lt;a href=&quot;https://review.whamcloud.com/37194&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/37194&lt;/a&gt; &quot;&lt;tt&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13124&quot; title=&quot;lfsck check for multiple linked file at OST&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13124&quot;&gt;&lt;del&gt;LU-13124&lt;/del&gt;&lt;/a&gt; scrub: check for multiple linked file&lt;/tt&gt;&quot; and was triggered during my run:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;CPU: 1 PID: 28775 Comm: lfsck  3.10.0-1160.21.1.el7_lustre.ddn13.x86_64 #1
Call Trace:
 dump_stack+0x19/0x1b
 lfsck_trans_create.part.54+0x6c/0x75 [lfsck]
 lfsck_namespace_trace_update+0xc2a/0xe10 [lfsck]
 lfsck_namespace_exec_oit+0xa1f/0xc50 [lfsck]
 lfsck_master_oit_engine+0x745/0x12d0 [lfsck]
 lfsck_master_engine+0x9de/0x1420 [lfsck]
 kthread+0xd1/0xe0
Lustre: testfs-MDT0001-osd: namespace LFSCK add flags for [0x240000402:0x10a1:0x0] in the trace file, flags 1, old 0, new 1: rc = -22
LustreError: 28775:0:(lfsck_internal.h:1553:lfsck_trans_create()) testfs-MDT0001-osd: transaction is being created in DRYRUN mode!
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This caused the system to be almost completely unusable because of the large number of stack traces being dumped.&lt;/p&gt;

&lt;p&gt;Two fixes are needed here:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;quiet the console messages/stack trace to be printed at most every 60s&lt;/li&gt;
	&lt;li&gt;fix &lt;tt&gt;lfsck_namespace_trace_update()&lt;/tt&gt; to check the &lt;tt&gt;DRYRUN&lt;/tt&gt; flag before trying to fix anything&lt;/li&gt;
&lt;/ul&gt;
</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00uhz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>