<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:53:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5708] Cannot get rid of orphaned objects</title>
                <link>https://jira.whamcloud.com/browse/LU-5708</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After some testing and benchmarking of a fresh filesystem, we found 1.4TB worth of orphaned objects. What we did was mostly several runs of tar (kernel tree extraction) and IOR benchmark.&lt;/p&gt;

&lt;p&gt;After deleting all temporary files from these tests, we end up with a Lustre containing just 108k of files:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@n0101 ~&amp;#93;&lt;/span&gt;# du -sh /mnt/lustre/lnec/&lt;br/&gt;
108K    /mnt/lustre/lnec/&lt;/p&gt;

&lt;p&gt;but according to df, 1..4TB are in use:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@n0101 ~&amp;#93;&lt;/span&gt;# df -h&lt;br/&gt;
Filesystem                             Size  Used Avail Use% Mounted on&lt;br/&gt;
...&lt;br/&gt;
10.4.0.102@o2ib:10.4.0.101@o2ib:/lnec  175T  1.4T  174T   1% /mnt/lustre/lnec&lt;/p&gt;

&lt;p&gt;Mounting one of the osts as ldiskfs and checking the contents we find that it contains 228GB worth of objects:&lt;/p&gt;

&lt;p&gt;oss01:ost0# du -sh --total O/*&lt;br/&gt;
85G     O/0&lt;br/&gt;
136K    O/1&lt;br/&gt;
136K    O/10&lt;br/&gt;
144G    O/2&lt;br/&gt;
136K    O/200000003&lt;br/&gt;
228G    total&lt;/p&gt;

&lt;p&gt;We have 6 osts in total and all of them are in a comparable state, so we add up to 1.4TB of objects.&lt;/p&gt;

&lt;p&gt;Trying &apos;lctl lfsck_start&apos; on the MDT or OSTs doesn&apos;t change that.&lt;/p&gt;</description>
                <environment></environment>
        <key id="26860">LU-5708</key>
            <summary>Cannot get rid of orphaned objects</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="4">Incomplete</resolution>
                                        <assignee username="dmiter">Dmitry Eremin</assignee>
                                    <reporter username="omangold">Oliver Mangold</reporter>
                        <labels>
                    </labels>
                <created>Mon, 6 Oct 2014 09:15:35 +0000</created>
                <updated>Sat, 6 Jun 2015 00:50:28 +0000</updated>
                            <resolved>Sat, 6 Jun 2015 00:50:28 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="95698" author="pjones" created="Mon, 6 Oct 2014 12:05:55 +0000"  >&lt;p&gt;Dmitry&lt;/p&gt;

&lt;p&gt;Could you please help with this issue?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="95699" author="omangold" created="Mon, 6 Oct 2014 12:57:25 +0000"  >&lt;p&gt;Maybe I should mention, we did several runs of obdfilter-survey. Could it be, this is the reason for the orphaned objects?&lt;/p&gt;</comment>
                            <comment id="95714" author="dmiter" created="Mon, 6 Oct 2014 16:10:18 +0000"  >&lt;p&gt;The used size is consumed by journal and directories size after creation of many files in OSTs. For example, After formatting a new Lustre file system you can see the following:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# ls -la /tmp/lustre-*
-rw------- 1 root root 204800000 Oct  6 20:00 /tmp/lustre-mdt1
-rw------- 1 root root 204800000 Oct  6 20:02 /tmp/lustre-ost1
-rw------- 1 root root 204800000 Oct  6 20:00 /tmp/lustre-ost2
# df -h
Filesystem        Size  Used Avail Use% Mounted on
/dev/loop1        147M   18M  120M  13% /mnt/mds1
/dev/loop2        184M   26M  148M  15% /mnt/ost1
/dev/loop3        184M   26M  148M  15% /mnt/ost2
vbox@tcp:/lustre  367M   51M  296M  15% /mnt/lustre

# mount -t ldiskfs -o loop /tmp/lustre-ost1 /mnt/ost1
# du -sh --total O/*
136K	O/0
136K	O/1
136K	O/10
136K	O/200000003
544K	total
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Also you can check a directory size after creation/removing many files.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# mkdir test
# du -sh test
4.0K    test
# for i in $(seq 1 1000); do touch test/$i; done
# du -sh test
20K     test
# rm -rf test/*
# du -sh test
20K     test
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="95774" author="omangold" created="Tue, 7 Oct 2014 08:05:54 +0000"  >&lt;p&gt;@Dmitry: I don&apos;t understand. What&apos;s your point? That journals and stuff take a few MB, even on an empty filesystem?&lt;/p&gt;

&lt;p&gt;I lost 1.4TB and it&apos;s definitely from files on the OSTs, apparently all with sizes a multiple of 384MB:&lt;/p&gt;

&lt;p&gt;oss01:~# ls -lh /mnt/lustre/ost0/O/0/d*/*&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  1 23:31 /mnt/lustre/ost0/O/0/d0/4364448&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 384M Oct  1 23:41 /mnt/lustre/ost0/O/0/d0/4364480&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  2 00:26 /mnt/lustre/ost0/O/0/d0/4364864&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d0/4366816&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d0/4367200&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d0/4367232&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 3.0G Oct  1 23:21 /mnt/lustre/ost0/O/0/d10/4364426&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 384M Oct  1 23:41 /mnt/lustre/ost0/O/0/d10/4364458&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d10/4366794&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d10/4367210&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d1/1&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 3.0G Oct  1 23:45 /mnt/lustre/ost0/O/0/d11/4364491&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  2 00:05 /mnt/lustre/ost0/O/0/d11/4364779&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  2 00:21 /mnt/lustre/ost0/O/0/d11/4364843&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d11/4366795&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d11/4367211&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 3.0G Oct  1 23:21 /mnt/lustre/ost0/O/0/d12/4364428&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 384M Oct  1 23:41 /mnt/lustre/ost0/O/0/d12/4364460&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 3.0G Oct  1 23:45 /mnt/lustre/ost0/O/0/d12/4364492&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  2 00:05 /mnt/lustre/ost0/O/0/d12/4364780&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  2 00:21 /mnt/lustre/ost0/O/0/d12/4364844&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d12/4366796&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d12/4367212&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 3.0G Oct  1 23:45 /mnt/lustre/ost0/O/0/d13/4364493&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  2 00:05 /mnt/lustre/ost0/O/0/d13/4364781&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  2 00:21 /mnt/lustre/ost0/O/0/d13/4364845&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d13/4366797&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d13/4367213&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  1 23:58 /mnt/lustre/ost0/O/0/d1/4364513&lt;br/&gt;
&lt;del&gt;rw-rw-rw&lt;/del&gt; 1  500 nagiocmd 768M Oct  2 00:26 /mnt/lustre/ost0/O/0/d1/4364865&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d1/4366785&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d1/4366817&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d1/4367201&lt;br/&gt;
&lt;del&gt;rwSrwSrw&lt;/del&gt; 1 root root        0 Jan  1  1970 /mnt/lustre/ost0/O/0/d1/4367233&lt;br/&gt;
... more of the same ...&lt;/p&gt;</comment>
                            <comment id="95779" author="efocht" created="Tue, 7 Oct 2014 11:11:57 +0000"  >&lt;p&gt;Is there a simple way (from user space) to find eg. the FID a particular object (file) belongs to? What is the object ID of something like&lt;br/&gt;
 /mnt/lustre/ost0/O/0/d1/4364513 ?&lt;/p&gt;
</comment>
                            <comment id="95780" author="dmiter" created="Tue, 7 Oct 2014 11:53:28 +0000"  >&lt;p&gt;Ok. Can you do the following sequence of commands and attach results?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# e2fsck -v -f -n --mdsdb /tmp/mdsdb &amp;lt;MDS-device&amp;gt;
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 &amp;lt;OST-device-0&amp;gt;
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-1 &amp;lt;OST-device-1&amp;gt;
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-2 &amp;lt;OST-device-2&amp;gt;
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-3 &amp;lt;OST-device-3&amp;gt;
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-4 &amp;lt;OST-device-4&amp;gt;
# e2fsck -v -f -n --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-5 &amp;lt;OST-device-5&amp;gt;
# lfsck -v -d --mdsdb /tmp/mdsdb --ostdb /tmp/ostdb-0 /tmp/ostdb-1 /tmp/ostdb-2 /tmp/ostdb-3 /tmp/ostdb-4 /tmp/ostdb-5 /mnt/lustre
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="95795" author="omangold" created="Tue, 7 Oct 2014 13:42:23 +0000"  >&lt;p&gt;Okay, lfsck seems to run through and claims to remove several objects (see log), but the files seem to be still there:&lt;/p&gt;

&lt;p&gt;mds02:~# df -h&lt;br/&gt;
Filesystem                               Size  Used Avail Use% Mounted on&lt;br/&gt;
10.4.0.102@o2ib0:10.4.0.101@o2ib0:/lnec  175T  1.4T  174T   1% /rw/mnt/lustre/lnec&lt;/p&gt;</comment>
                            <comment id="95796" author="dmiter" created="Tue, 7 Oct 2014 14:00:35 +0000"  >&lt;p&gt;I suppose the space is reserved by removed data and will be reused later. Why it&apos;s so critical for you to see small number of used blocks in df output? By the way what number was for just formatted FS?&lt;/p&gt;</comment>
                            <comment id="95823" author="green" created="Tue, 7 Oct 2014 17:14:27 +0000"  >&lt;p&gt;So your objects on OSTs might still be referenced by something on MDS, be it real files or not.&lt;/p&gt;

&lt;p&gt;You can use ll_decode_filter_fid tool from lustre utils to see what is suposed parent object fid is on MDS for an ldiskfs object and then look it up there.&lt;/p&gt;</comment>
                            <comment id="95909" author="omangold" created="Wed, 8 Oct 2014 08:14:58 +0000"  >&lt;p&gt;I tried to resolve all objects with ll_decode_filter_fid. What I got was:&lt;/p&gt;

&lt;p&gt;1. lots of empty object files which apparently do not have a find, ll_decode_filter_fid returns for these &apos;error reading fid: No data available&apos;&lt;br/&gt;
2. a bunch of non-empty files returning fids. These are the ones to seem use up the disk space. I tried to resolve the fids with &apos;lfs fid2path&apos; and got&lt;br/&gt;
2a. most of these objects cannot be resolved, &apos;lfs fid2path&apos; returns &apos;error on FID xxx: Invalid argument&apos;&lt;br/&gt;
2b. a few with return a path to an actual existing file&lt;/p&gt;

&lt;p&gt;So how do I clean this up. Can a delete all files for cases (1) and (2a)?&lt;/p&gt;</comment>
                            <comment id="115761" author="jfc" created="Tue, 19 May 2015 00:38:00 +0000"  >&lt;p&gt;Hi Oliver,&lt;/p&gt;

&lt;p&gt;Can you tell us the name of the end customer on this ticket please?&lt;/p&gt;

&lt;p&gt;Is this still a relevant issue for you?&lt;/p&gt;

&lt;p&gt;Many thanks,&lt;br/&gt;
~ jfc.&lt;/p&gt;</comment>
                            <comment id="115791" author="omangold" created="Tue, 19 May 2015 06:35:07 +0000"  >&lt;p&gt;This was a problem we encountered on our own benchmark system. It is nothing urgent, but we thought to report it anyway, for you to know that there is an issue.&lt;/p&gt;</comment>
                            <comment id="117646" author="jfc" created="Sat, 6 Jun 2015 00:50:28 +0000"  >&lt;p&gt;Thanks Oliver,&lt;/p&gt;

&lt;p&gt;I&apos;m marking it as resolved/incomplete &amp;#8211; it will remain visible to all, and can still be searched on, in needed.&lt;br/&gt;
~ jfc.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="29325">LU-6414</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="15902" name="lfsck.log" size="45066" author="omangold" created="Tue, 7 Oct 2014 13:39:43 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwxun:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>15997</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>