<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:20:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1866] LFSCK Phase 1.5 for FID-in-dirent and linkEA consistency</title>
                <link>https://jira.whamcloud.com/browse/LU-1866</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;In Lustre-2.x, when create a file, its FID (File IDentifier) is stored as part of the name entry in the parent directory, which is called FID-in-dirent. With the FID-in-dirent, readdir on the MDT can fetch the FID from the directory page directly instead of having to get it from the object LMA (Lustre Metadata Attributes) extended attribute stored on the inode. As a result, traversing the directory (such as &quot;ls&quot;) with FID-in-dirent is much faster than having to access the FID from the LMA. Also at file creation time, the FID of the parent directory and the name of the file are stored in the linkEA extended attribute on the inode. With the linkEA, any given FID can be parsed back to a full path from the root directory to the target file. It is useful for those ChangeLog based applications, like &quot;lustre_rsync&quot; and when generating error messages or POSIX style pathname permission checks. Hard links to a regular file also create the same FID-in-dirent and linkEA attributes to be stored.&lt;/p&gt;

&lt;p&gt;Over the lifetime of an active filesystem, some FID-in-dirent and linkEA may become inconsistent or invalid as the result of on-disk corruption, after restoring from MDT file-level backup, or if the MDT filesystem was originally formatted under Lustre 1.8. Currently, if the MDT is upgraded from Lustre 1.8 or after the MDT is restored from a file-level backup, the MDT will be missing all the FID-in-dirent data, which will reduce the performance of readdir(3) on the MDT. Additionally, for an MDT upgraded from Lustre 1.8 the linkEA is also unavailable and the 2.x &quot;lctl fid2path&quot; functionality will not be available for those files.&lt;/p&gt;

&lt;p&gt;In LFSCK Phase 1.5 we will implement the functionality of verifying and rebuilding FID-in-dirent and linkEA under for the single-MDT case. It will do these additional operations while the MDT is iterating over the objects table for OI Scrub. It will check whether the FID-in-dirent name entry is consistent with the FID in the object LMA or not, repair it if unmatched or rebuild it if the FID-in-dirent is missing. It also verifies that the name entry is correctly referenced by the object linkEA and the object linkEA points back to the valid name entry. The unmatched or redundant object linkEA will be removed, and the missed object linkEA will be added. In the case of Lustre 1.8 inodes with IGIF FIDs after an upgrade, it will store the IGIF FID into the LMA xattr on the inode, then in the FID-in-dirent and linkEA as it would for any 2.x FID.&lt;/p&gt;

&lt;p&gt;The LFSCK Phase III project was to handle the FID-in-dirent and linkEA verification. This included both local-MDT references and cross-MDT cases where the directory entry and the object are located on different MDTs. The LFSCK Phase 1.5 implementation of FID-in-dirent and linkEA consistency check/repair contains a significant part of the LFSCK Phase III work.&lt;/p&gt;

&lt;p&gt;Currently, the DNE project is underway. To make the LFSCK project less dependent on DNE project, we prefer to split the LFSCK Phase III into two parts: for DNE cases and for non-DNE cases. The part for non-DNE cases will be processed in the LFSCK Phase 1.5, and the other part will be processed after the DNE project completed. Having the LFSCK Phase 1.5 work completed earlier also benefits sites upgrading from Lustre 1.8.&lt;/p&gt;</description>
                <environment></environment>
        <key id="15842">LU-1866</key>
            <summary>LFSCK Phase 1.5 for FID-in-dirent and linkEA consistency</summary>
                <type id="2" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11311&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="yong.fan">nasf</reporter>
                        <labels>
                            <label>LFSCK</label>
                    </labels>
                <created>Sat, 8 Sep 2012 21:51:15 +0000</created>
                <updated>Wed, 18 Jun 2014 22:20:21 +0000</updated>
                            <resolved>Sat, 20 Jul 2013 06:05:29 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="49108" author="yong.fan" created="Wed, 12 Dec 2012 05:03:41 +0000"  >&lt;p&gt;First version implementation for LFSCK 1.5 without test cases yet:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#change,4807&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4807&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="49516" author="tappro" created="Thu, 20 Dec 2012 23:06:36 +0000"  >&lt;p&gt;FYI, the 27z test failure with your patch:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;sanity test_27z: @@@@@@ FAIL: O/0/d16/240: no filter_fid info
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This is because you&apos;ve added extra EA to all files, at least to the OST objects. Therefore filter_fid EA is not fit into inode body anymore and debugfs can&apos;t find it. I saw the same issue in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-838&quot; title=&quot;&amp;quot;lfs path2fid /mnt/lustre&amp;quot; (ROOT) returns inode number&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-838&quot;&gt;&lt;del&gt;LU-838&lt;/del&gt;&lt;/a&gt; when added LMA to the all files. We&apos;ve discussed a bit this with Andreas and Alex. I forward thread to you&lt;/p&gt;

&lt;p&gt;I tend to think also that having filter_fid not in inode body is still OK, so maybe we need to check this EA in other way than debugfs?&lt;/p&gt;
</comment>
                            <comment id="49524" author="yong.fan" created="Fri, 21 Dec 2012 01:33:59 +0000"  >&lt;p&gt;From a long review, it is quite possible to introduce more EA for the inode, so if we can find other suitable way to check the EAs, then it is better.&lt;/p&gt;</comment>
                            <comment id="49618" author="yong.fan" created="Sun, 23 Dec 2012 02:20:45 +0000"  >&lt;p&gt;This patch can pass most sanity test cases, both sanity-scrub.sh and sanity-lfsck.sh work well against this version:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#change,4807,set7&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4807,set7&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="49621" author="yong.fan" created="Sun, 23 Dec 2012 08:08:50 +0000"  >&lt;p&gt;LFSCK 1.5 functionality tests results.&lt;/p&gt;</comment>
                            <comment id="49637" author="yong.fan" created="Mon, 24 Dec 2012 05:06:07 +0000"  >&lt;p&gt;Except some known conf-sanity issues, all other sanity tests can run.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#change,4807,set8&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,4807,set8&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="49654" author="yong.fan" created="Mon, 24 Dec 2012 20:06:18 +0000"  >&lt;p&gt;Pass tests on Maloo (set 8)&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://maloo.whamcloud.com/test_sessions/d9c100a0-4df6-11e2-9dc7-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sessions/d9c100a0-4df6-11e2-9dc7-52540035b04c&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="49665" author="tappro" created="Tue, 25 Dec 2012 04:31:45 +0000"  >&lt;p&gt;could you describe what was changed to pass tests? I see that not all files have dirdata now, right?&lt;/p&gt;</comment>
                            <comment id="49667" author="yong.fan" created="Tue, 25 Dec 2012 04:59:59 +0000"  >&lt;p&gt;Currently, IDIF objects have no FID-in-LMA, since they are not in OI files.&lt;/p&gt;</comment>
                            <comment id="49748" author="yong.fan" created="Fri, 28 Dec 2012 05:51:11 +0000"  >&lt;p&gt;This is the patch to be reviewed:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/5046&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5046&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4901&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4901&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4902&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4902&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4903&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4903&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4904&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4904&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4906&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4906&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4907&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4907&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4908&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4908&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4909&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4909&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4910&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4910&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4911&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4911&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4912&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4912&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4913&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4913&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/4914&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4914&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="49774" author="yong.fan" created="Fri, 28 Dec 2012 21:51:56 +0000"  >&lt;p&gt;LFSCK 1.5 functionality test results&lt;/p&gt;</comment>
                            <comment id="51109" author="yong.fan" created="Thu, 24 Jan 2013 11:56:55 +0000"  >&lt;p&gt;FLSCK 1.5 test plan (ldiskfs only)&lt;br/&gt;
****************&lt;/p&gt;

&lt;p&gt;1. Correctness&lt;br/&gt;
----------------&lt;/p&gt;

&lt;p&gt;1.1) sanity-lfsck on Maloo with commit message &quot;Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-lfsck&quot;. All test cases should pass.&lt;/p&gt;

&lt;p&gt;1.2) sanity-scrub on Maloo with commit message &quot;Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-scrub&quot;. All test cases should pass.&lt;/p&gt;

&lt;p&gt;1.3) normal acc-sm tests on Maloo. All test cases should pass except for some known master failures.&lt;/p&gt;


&lt;p&gt;2. Performance&lt;br/&gt;
----------------&lt;/p&gt;

&lt;p&gt;The file set to be tested should be generated with the following conditions:&lt;/p&gt;

&lt;p&gt;A) Create single test root directroy.&lt;/p&gt;

&lt;p&gt;B) Create N sub-directories under the test root directory.&lt;/p&gt;

&lt;p&gt;C) Under each sub-directory, create 100K objects, include 1K multiple-linked objects, and 1K symlink objects, and 8K empty directories, the others are normal files.&lt;/p&gt;


&lt;p&gt;2.1) lfsck against healthy 2.x MDT device for consistency routine check.&lt;/p&gt;

&lt;p&gt;2.1.1) Create above test file set with Lustre-2.4.&lt;/p&gt;

&lt;p&gt;2.1.2) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 100, 200, 400, 800, 1600&lt;/p&gt;


&lt;p&gt;2.2) lfsck against 2.x MDT device which is restored from file-level backup.&lt;/p&gt;

&lt;p&gt;2.2.1) Create above test file set with Lustre-2.4.&lt;/p&gt;

&lt;p&gt;2.2.2) Perform MDT file-level backup/restore.&lt;/p&gt;

&lt;p&gt;2.2.3) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 10, 20, 40, 80, 160&lt;/p&gt;


&lt;p&gt;2.3) lfsck agaist the MDT device which is upgraded from 1.8.&lt;/p&gt;

&lt;p&gt;2.3.1) Create above test file set with Lustre-1.8&lt;/p&gt;

&lt;p&gt;2.3.2) Update the system to 2.4. Use &quot;tunefs --dirdata&quot; to enable FID-in-dirent on MDT.&lt;/p&gt;

&lt;p&gt;2.3.3) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 100, 200, 400, 800, 1600&lt;/p&gt;


&lt;p&gt;3. Create performance impact by lfsck&lt;br/&gt;
----------------&lt;/p&gt;

&lt;p&gt;Measure how much the routine lfsck will affect normal create performance. Generate test file set as described in section 2 with N = 400.&lt;/p&gt;

&lt;p&gt;3.1) Run lfsck with full speed on the file set. At the same time, use C threads to create 10M files (mds-survey) in parallel. Each thread creates under its private directory, and create 10M / C files.&lt;/p&gt;

&lt;p&gt;3.2) Measure the create performance with different lfsck speed limit. According to the 3.1) result, we can know the highest speed for lfsck with create work load, assume it is S. Then repeate the test with lfsck speed limit = (1/4)S, (1/2)S, (3/4)S.&lt;/p&gt;


&lt;p&gt;4. Scale test&lt;br/&gt;
----------------&lt;/p&gt;

&lt;p&gt;Run mdtest on Hyperion, the routine lfsck should run background repeatedly. We can inject some known failure stubs by set fail_loc on MDS, such as OBD_FAIL_FID_INDIR, OBD_FAIL_FID_INLMA, OBD_FAIL_LFSCK_LINKEA_MORE, OBD_FAIL_LFSCK_LINKEA_LESS, and so on, then the lfsck can repair something during the check. There should be no failures reported.&lt;/p&gt;


&lt;p&gt;5. DNE support&lt;br/&gt;
----------------&lt;/p&gt;

&lt;p&gt;LFSCK correctness verification under DNE mode, depends on &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2646&quot; title=&quot;add special flag in the lma of the agent inode&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2646&quot;&gt;&lt;del&gt;LU-2646&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;5.1) Setup the DNE environment with 2 MDSes.&lt;/p&gt;

&lt;p&gt;5.2) Generate file set, include remote objects.&lt;/p&gt;

&lt;p&gt;5.3) Run lfsck on each MDS in parallel to check whether there are failures.&lt;/p&gt;


&lt;p&gt;6. Resource requirement.&lt;br/&gt;
----------------&lt;/p&gt;

&lt;p&gt;6.1) Test 1 can be done locally and on Maloo.&lt;/p&gt;

&lt;p&gt;6.2) Test 2 and 3 can be done on Toro with 1 fat node.&lt;/p&gt;

&lt;p&gt;6.3) Test 4 needs to be tested on Hyperion. It is better if some guys can help to do that.&lt;/p&gt;

&lt;p&gt;6.4) Test 5 can be done on Toro with 4 nodes.&lt;/p&gt;</comment>
                            <comment id="55795" author="paf" created="Mon, 8 Apr 2013 19:47:34 +0000"  >&lt;p&gt;nasf,&lt;br/&gt;
I&apos;m trying to make sure I understand the current status of this, as we at Cray are looking at starting some testing of upgrades from 1.8.x to 2.4.&lt;/p&gt;

&lt;p&gt;Do the current patches fully cover the intended functionality of Phase 1.5 and just need more testing?  If not, what functionality is still missing?&lt;/p&gt;</comment>
                            <comment id="56966" author="paf" created="Wed, 24 Apr 2013 18:42:11 +0000"  >&lt;p&gt;What happens if LFSCK -t namespace is done and dirdata has not been enabled on the MDS?&lt;/p&gt;</comment>
                            <comment id="57012" author="yong.fan" created="Thu, 25 Apr 2013 08:10:31 +0000"  >&lt;p&gt;Current master + the patch &lt;a href=&quot;http://review.whamcloud.com/#change,6078&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,6078&lt;/a&gt; can support upgrading from Lustre-1.8.x to Lustre-2.4 well.&lt;/p&gt;

&lt;p&gt;For the &quot;lfsck -t namespace&quot; on lustre-1.8.x device, but without &quot;dirdata&quot;, then directory structure will keep unchanged, other parts, such as IGIF-in-LMA, IGIF-in-OI, linkEA for IGIF will be generated as the case of with &quot;dirdata&quot;.&lt;/p&gt;</comment>
                            <comment id="62660" author="yong.fan" created="Sat, 20 Jul 2013 06:05:29 +0000"  >&lt;p&gt;All the patches for LFSCK 1.5 have been landed to Lustre-2.5&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="11494">LU-591</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="17091">LU-2646</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="15927">LUDOC-85</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="12984" name="LFSCK15_Demonstration_Milestone_Completion_r2.pdf" size="104472" author="rhenwood" created="Mon, 3 Jun 2013 18:12:44 +0000"/>
                            <attachment id="12985" name="LFSCK15_Implementation_Milestone_Completion.pdf" size="123754" author="rhenwood" created="Mon, 3 Jun 2013 18:13:17 +0000"/>
                            <attachment id="12123" name="sanity-lfsck_20121229.log" size="13599" author="yong.fan" created="Fri, 28 Dec 2012 21:51:56 +0000"/>
                    </attachments>
                <subtasks>
                            <subtask id="17431">LU-2742</subtask>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvigf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6640</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10002" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Story Points</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>55.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>