<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:01:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13489] Does LFSCK check on-disk information</title>
                <link>https://jira.whamcloud.com/browse/LU-13489</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I&apos;m a PhD student and I&apos;ve been working on Lustre reliability study. My group found by manually destroying MDS or OSS layout can lead to resource leak problem which means part of storage space or namespace are not usable by client. This problem actually has been discussed in the paper &apos;PFault&apos; published on ICS &apos;18, and in this paper the ressource leak is caused by e2fsck changing OST layout. However I found several other ways to trigger the same issue, as long as to destroy MDT-OST consistency. Here is a simple way to rebuilt the scenario:&lt;br/&gt;
 1. Create a 1 client+1MDS+3OSS cluster&lt;br/&gt;
 2. Write some files to Lustre on client node, and check system usage with &apos;lfs df -h&apos;&lt;br/&gt;
 3. Umount mdt directory on MDS, reformat mdt&apos;s disk partition and remount. This step is to destroy consistency between MDT and OST&lt;br/&gt;
 4. Check with Lustre directory on client node, user files were no more there, but &apos;lfs df -h&apos; shows that the space is not released&lt;br/&gt;
 5. Run lfsck, and &apos;lfs df -h&apos; again. However lfsck didn&apos;t move stale objects on OSS to &apos;/lost+found&apos; and the storage space leak is still there&lt;/p&gt;

&lt;p&gt;I&apos;m not sure if this is in the scope of lfsck&apos;s functionality, but I know lfsck&apos;s namespace phase is said to be able to remove orphan objects. This problem can potentially do damage to clusters since on-disk object files can be easily removed by misoperations, and cannot be detected by lfsck.&lt;br/&gt;
 Thanks!&lt;/p&gt;

&lt;p&gt;Runzhou Han&lt;br/&gt;
 Dept. of Electrical &amp;amp; Computer Engineering&lt;br/&gt;
 Iowa State University&lt;/p&gt;</description>
                <environment>CentOS 7, ldiskfs</environment>
        <key id="58954">LU-13489</key>
            <summary>Does LFSCK check on-disk information</summary>
                <type id="3" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11318&amp;avatarType=issuetype">Task</type>
                                            <priority id="5" iconUrl="https://jira.whamcloud.com/images/icons/priorities/trivial.svg">Trivial</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="10000">Done</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="rzhan">Runzhou Han</reporter>
                        <labels>
                            <label>lfsck</label>
                    </labels>
                <created>Tue, 28 Apr 2020 23:33:56 +0000</created>
                <updated>Wed, 10 Jun 2020 23:26:12 +0000</updated>
                            <resolved>Wed, 10 Jun 2020 23:26:12 +0000</resolved>
                                    <version>Lustre 2.10.8</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                            <comments>
                            <comment id="268851" author="adilger" created="Wed, 29 Apr 2020 08:41:32 +0000"  >&lt;p&gt;Could you please provide more detail about how you are running LFSCK?  Are you using something like &lt;tt&gt;lctl lfsck_start -A -t all -o&lt;/tt&gt;&quot; to link the orphan objects into &lt;tt&gt;.../.lustre/lost+found/&lt;/tt&gt; so they can be recovered or removed?&lt;/p&gt;</comment>
                            <comment id="268911" author="rzhan" created="Wed, 29 Apr 2020 22:09:03 +0000"  >&lt;p&gt;Thank you for replying me!&lt;/p&gt;

&lt;p&gt;I didn&apos;t use any additional arguments and simply used &#160;&apos;&lt;tt&gt;lctl lfsck_start&lt;/tt&gt;&apos;. I tried to use &#160;&apos;&lt;tt&gt;lctl lfsck_start -A -t all -o&lt;/tt&gt;&apos; but I think lfsck still didn&apos;t link orphan objects into &lt;tt&gt;/.lustre/lost+found/&lt;/tt&gt;, if it means I can see orphan objects via &lt;tt&gt;ls&lt;/tt&gt; in Lustre&apos;s &lt;tt&gt;/lost+found&lt;/tt&gt; folder.&lt;/p&gt;

&lt;p&gt;I post detailed operations and logs of producing the problem here.&lt;/p&gt;

&lt;p&gt;After cluster, check brand new cluster usage:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client0 pf_pfs_worker]# lfs df -h
UUID &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; bytes&#160; &#160; &#160; &#160; Used &#160; Available Use% Mounted on
lustre-MDT0000_UUID&#160; &#160; &#160; &#160; 96.0M&#160; &#160; &#160; &#160; 1.7M &#160; &#160; &#160; 85.6M &#160; 2% /lustrefs[MDT:0]
lustre-OST0000_UUID &#160; &#160; &#160; 413.4M &#160; &#160; &#160; 13.2M&#160; &#160; &#160; 365.3M &#160; 3% /lustrefs[OST:0]
lustre-OST0001_UUID &#160; &#160; &#160; 413.4M &#160; &#160; &#160; 13.2M&#160; &#160; &#160; 365.3M &#160; 3% /lustrefs[OST:1]
lustre-OST0002_UUID &#160; &#160; &#160; 413.4M &#160; &#160; &#160; 13.2M&#160; &#160; &#160; 365.3M &#160; 3% /lustrefs[OST:2]

filesystem_summary: &#160; &#160; &#160; &#160; 1.2G &#160; &#160; &#160; 39.6M&#160; &#160; &#160; &#160; 1.1G &#160; 3% /lustrefs
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;br/&gt;
 Then I ran some write workloads to age the cluster:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client0 pf_pfs_worker]# ./pfs_worker_cp.sh&#160;
[root@client0 pf_pfs_worker]# ./pfs_worker_age.sh&#160;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;br/&gt;
 Check cluster usage again:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client0 pf_pfs_worker]# lfs df -h
UUID &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; bytes&#160; &#160; &#160; &#160; Used &#160; Available Use% Mounted on
lustre-MDT0000_UUID&#160; &#160; &#160; &#160; 96.0M&#160; &#160; &#160; &#160; 1.7M &#160; &#160; &#160; 85.6M &#160; 2% /lustrefs[MDT:0]
lustre-OST0000_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 136.9M&#160; &#160; &#160; 235.9M&#160; 37% /lustrefs[OST:0]
lustre-OST0001_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 121.9M&#160; &#160; &#160; 250.9M&#160; 33% /lustrefs[OST:1]
lustre-OST0002_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 119.5M&#160; &#160; &#160; 253.5M&#160; 32% /lustrefs[OST:2]

filesystem_summary: &#160; &#160; &#160; &#160; 1.2G&#160; &#160; &#160; 378.3M&#160; &#160; &#160; 740.3M&#160; 34% /lustrefs
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It shows that about 300+MB data is written to OSTs.&lt;/p&gt;

&lt;p&gt;On MDS, umount MDT, reformat and mount again:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@mds /]# umount /mdt
[root@mds /]# mkfs.lustre --fsname=lustre --mgsnode=192.168.1.7@tcp0 --mdt --index=0 --reformat /dev/sdb

Permanent disk data:
Target: &#160; &#160; lustre:MDT0000
Index:&#160; &#160; &#160; 0
Lustre FS:&#160; lustre
Mount type: ldiskfs
Flags:&#160; &#160; &#160; 0x61
&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;(MDT first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters: mgsnode=192.168.1.7@tcp
device size = 200MB
formatting backing filesystem ldiskfs on /dev/sdb
target name &#160; lustre:MDT0000
4k blocks &#160; &#160; 51200
options&#160; &#160; &#160; &#160; -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F
mkfs_cmd = mke2fs -j -b 4096 -L lustre:MDT0000&#160; -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,huge_file,flex_bg -E lazy_journal_init -F /dev/sdb 51200
Writing CONFIGS/mountdata
[root@mds /]# mount.lustre /dev/sdb /mdt
mount.lustre: mount /dev/sdb at /mdt failed: Address already in use
The target service&apos;s index is already in use. (/dev/sdb)
[root@mds /]# mount.lustre /dev/sdb /mdt
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It&#8217;s interesting at first Lustre doesn&#8217;t allow me to remount, but the second try worked.&lt;/p&gt;

&lt;p&gt;Then check with client side Lustre directory, and cluster usage:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client0 pf_pfs_worker]# ls /lustrefs/
[root@client0 pf_pfs_worker]# lfs df -h
UUID &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; bytes&#160; &#160; &#160; &#160; Used &#160; Available Use% Mounted on
lustre-MDT0000_UUID&#160; &#160; &#160; &#160; 96.0M&#160; &#160; &#160; &#160; 1.7M &#160; &#160; &#160; 85.6M &#160; 2% /lustrefs[MDT:0]
lustre-OST0000_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 130.0M&#160; &#160; &#160; 245.5M&#160; 35% /lustrefs[OST:0]
lustre-OST0001_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 127.5M&#160; &#160; &#160; 248.0M&#160; 34% /lustrefs[OST:1]
lustre-OST0002_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 125.2M&#160; &#160; &#160; 250.3M&#160; 33% /lustrefs[OST:2]
&#160;
filesystem_summary: &#160; &#160; &#160; &#160; 1.2G&#160; &#160; &#160; 382.8M&#160; &#160; &#160; 743.8M&#160; 34% /lustrefs
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Client data is not visible to client, but the storage space is not released.&lt;/p&gt;

&lt;p&gt;Try to fix this inconsistency with lfsck:&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@mds /]# lctl lfsck_start -A -t all -o
Started LFSCK on the device lustre-MDT0000: scrub layout namespace
[root@mds /]# lctl lfsck_query
layout_mdts_init: 0
layout_mdts_scanning-phase1: 0
layout_mdts_scanning-phase2: 0
layout_mdts_completed: 1
layout_mdts_failed: 0
layout_mdts_stopped: 0
layout_mdts_paused: 0
layout_mdts_crashed: 0
layout_mdts_partial: 0
layout_mdts_co-failed: 0
layout_mdts_co-stopped: 0
layout_mdts_co-paused: 0
layout_mdts_unknown: 0
layout_osts_init: 0
layout_osts_scanning-phase1: 0
layout_osts_scanning-phase2: 0
layout_osts_completed: 3
layout_osts_failed: 0
layout_osts_stopped: 0
layout_osts_paused: 0
layout_osts_crashed: 0
layout_osts_partial: 0
layout_osts_co-failed: 0
layout_osts_co-stopped: 0
layout_osts_co-paused: 0
layout_osts_unknown: 0
layout_repaired: 285
namespace_mdts_init: 0
namespace_mdts_scanning-phase1: 0
namespace_mdts_scanning-phase2: 0
namespace_mdts_completed: 1
namespace_mdts_failed: 0
namespace_mdts_stopped: 0
namespace_mdts_paused: 0
namespace_mdts_crashed: 0
namespace_mdts_partial: 0
namespace_mdts_co-failed: 0
namespace_mdts_co-stopped: 0
namespace_mdts_co-paused: 0
namespace_mdts_unknown: 0
namespace_osts_init: 0
namespace_osts_scanning-phase1: 0
namespace_osts_scanning-phase2: 0
namespace_osts_completed: 0
namespace_osts_failed: 0
namespace_osts_stopped: 0
namespace_osts_paused: 0
namespace_osts_crashed: 0
namespace_osts_partial: 0
namespace_osts_co-failed: 0
namespace_osts_co-stopped: 0
namespace_osts_co-paused: 0
namespace_osts_unknown: 0
namespace_repaired: 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;It shows that lfsck has repaired 285 objects.&#160;&lt;/p&gt;

&lt;p&gt;On client node check cluster usage again:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client0 pf_pfs_worker]# lfs df -h
UUID &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; bytes&#160; &#160; &#160; &#160; Used &#160; Available Use% Mounted on
lustre-MDT0000_UUID&#160; &#160; &#160; &#160; 96.0M&#160; &#160; &#160; &#160; 1.7M &#160; &#160; &#160; 85.6M &#160; 2% /lustrefs[MDT:0]
lustre-OST0000_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 130.2M&#160; &#160; &#160; 245.4M&#160; 35% /lustrefs[OST:0]
lustre-OST0001_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 127.5M&#160; &#160; &#160; 248.0M&#160; 34% /lustrefs[OST:1]
lustre-OST0002_UUID &#160; &#160; &#160; 413.4M&#160; &#160; &#160; 125.2M&#160; &#160; &#160; 250.3M&#160; 33% /lustrefs[OST:2]

filesystem_summary: &#160; &#160; &#160; &#160; 1.2G&#160; &#160; &#160; 382.9M&#160; &#160; &#160; 743.7M&#160; 34% /lustrefs
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The storage space is still not released.&lt;/p&gt;

&lt;p&gt;Check OSTs&#8217; &lt;tt&gt;/.lustre/lost+found&lt;/tt&gt;, but find they are empty:&#160;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@oss0 /]# ls /ost0_bf/lost+found/
[root@oss1 /]# ls /ost1_bf/lost+found/
[root@oss2 /]# ls /ost2_bf/lost+found/
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="268940" author="adilger" created="Thu, 30 Apr 2020 04:24:50 +0000"  >&lt;p&gt;The files recovered by LFSCK would not be on the OSTs, but rather they would be re-attached into the filesystem namespace on the client nodes under the directory &lt;tt&gt;/lustrefs/.lustre/lost+found/MDT0000/&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="268941" author="adilger" created="Thu, 30 Apr 2020 04:27:21 +0000"  >&lt;p&gt;The &lt;tt&gt;lost+found&lt;/tt&gt; on the OST backing filesystems would be used by the local &lt;tt&gt;e2fsck&lt;/tt&gt; in case there is corruption of the underlying disk filesystem (e.g. directory &lt;tt&gt;O/0/d4&lt;/tt&gt; is corrupted).  In that case, after the local &lt;tt&gt;e2fsck&lt;/tt&gt; runs and puts orphan objects into the local &lt;tt&gt;lost+found&lt;/tt&gt;, then LFSCK OI_Scrub would detect this on restart and rebuild the &lt;tt&gt;O/0/d4&lt;/tt&gt; directory and restore the objects.&lt;/p&gt;</comment>
                            <comment id="269045" author="rzhan" created="Thu, 30 Apr 2020 21:28:01 +0000"  >&lt;p&gt;Thanks! This really solved my long term confusion.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                <customfield id="customfield_10003" key="com.atlassian.jira.plugin.system.customfieldtypes:float">
                        <customfieldname>Business Value</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10040" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic</customfieldname>
                        <customfieldvalues>
                                        <label>LFSCK</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>lfsck</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00z1j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>