<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:42:20 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4395] files created with non-existent objects</title>
                <link>https://jira.whamcloud.com/browse/LU-4395</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;KIT has run into an issue where the MDT is creating files with objects that do not exist. Some of the symptoms look similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4034&quot; title=&quot;Cannot allocate memory on clients with 2.4.X&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4034&quot;&gt;&lt;del&gt;LU-4034&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;On client:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[root@client scc]# touch tmp/gaga
touch: setting times of `tmp/gaga&apos;: No such file or directory
[root@client scc]# lfs getstripe tmp/gaga
tmp/gaga
lmm_stripe_count:   4
lmm_stripe_size:    1048576
lmm_layout_gen:     0
lmm_stripe_offset:  5
	obdidx		 objid		 objid		 group
	     5	      65948624	    0x3ee4bd0	             0
	    25	      66739551	    0x3fa5d5f	             0
	     9	      65922640	    0x3ede650	             0
	    24	      66084357	    0x3f05e05	             0
LustreError: 11-0: HC3WORK-OST0005-osc-ffff8804987dec00: Communicating 
with 172.26.3.138@o2ib, operation ldlm_enqueue failed with -12.
[root@mds2 perftest]# ls -al
ls: cannot access eaea: Cannot allocate memory
ls: cannot access gaga: Cannot allocate memory
total 12
drwxr-xr-x  3 er2341 scc  4096 Dec 12 21:40 .
drwx------ 10 er2341 scc  4096 Sep 19 16:16 ..
-rw-r--r--  1 root   root    0 Dec 12 21:41 e
-?????????  ? ?      ?       ?            ? eaea
-rw-r--r--  1 root   root    0 Dec 12 21:41 f
-?????????  ? ?      ?       ?            ? gaga
drwxr-xr-x  2 root   root 4096 Dec 12 21:40 tmp
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;on OSS:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Dec 12 22:25:05 oss1 kernel: : LustreError: 14167:0:(ldlm_resource.c:1165:ldlm_resource_get()) HC3WORK-OST0005: lvbo_init failed &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; resource 0x3ee4bd0:0x0: rc = -2
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On thing that&apos;s odd is that all the other OSTs on the system delete orphan objects around that object ID number, but not ost5:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;# echo $((0x3ee4bd0))
65948624

Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0002: deleting orphan objects from 0x0:66829746 to 0x0:66830014
Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0006: deleting orphan objects from 0x0:66151265 to 0x0:66151535
Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0000: deleting orphan objects from 0x0:66341886 to 0x0:66342155
Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767207
Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0003: deleting orphan objects from 0x0:66145109 to 0x0:66145379
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Another weird thing is that the OSTs seem to delete the same objects repeatedly:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Dec  9 15:04:30 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767015
Dec  9 15:11:41 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767015
Dec 10 09:58:31 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767047
Dec 10 16:20:25 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767079
Dec 10 16:33:00 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767111
Dec 11 15:54:57 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767143
Dec 11 16:50:03 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767175
Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767207
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The filesystem was put back into production by disabling the OSTs that have this symptom. Are there any suggestions for what to look at in order to further debug this issue? Any logs we should get?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Kit&lt;/p&gt;</description>
                <environment></environment>
        <key id="22507">LU-4395</key>
            <summary>files created with non-existent objects</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="kitwestneat">Kit Westneat</reporter>
                        <labels>
                    </labels>
                <created>Wed, 18 Dec 2013 18:08:27 +0000</created>
                <updated>Wed, 29 Jan 2014 15:05:14 +0000</updated>
                            <resolved>Wed, 29 Jan 2014 15:05:14 +0000</resolved>
                                    <version>Lustre 2.4.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="73795" author="kitwestneat" created="Wed, 18 Dec 2013 20:41:12 +0000"  >&lt;p&gt;I&apos;ve been looking at the prealloc info on the mds and the LAST_ID on the OST:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;[root@mds1 ~]# grep . /proc/fs/lustre/osc/HC3WORK-OST0005-osc-MDT0000/pre*
/proc/fs/lustre/osc/HC3WORK-OST0005-osc-MDT0000/prealloc_last_id:65948661
/proc/fs/lustre/osc/HC3WORK-OST0005-osc-MDT0000/prealloc_last_seq:0x100050000
/proc/fs/lustre/osc/HC3WORK-OST0005-osc-MDT0000/prealloc_next_id:65948632
/proc/fs/lustre/osc/HC3WORK-OST0005-osc-MDT0000/prealloc_next_seq:0x100050000
/proc/fs/lustre/osc/HC3WORK-OST0005-osc-MDT0000/prealloc_reserved:0
/proc/fs/lustre/osc/HC3WORK-OST0005-osc-MDT0000/prealloc_status:-19

# debugfs -c -R &lt;span class=&quot;code-quote&quot;&gt;&apos;dump O/0/LAST_ID /tmp/LAST_ID&apos;&lt;/span&gt; /dev/mapper/ost_HC3WORK_5
# hexdump -e &lt;span class=&quot;code-quote&quot;&gt;&apos;1/8 &lt;span class=&quot;code-quote&quot;&gt;&quot;%d\n&quot;&lt;/span&gt;&apos;&lt;/span&gt; /tmp/LAST_ID
65988995
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So it looks like the OST&apos;s LAST_ID is significantly larger than what the MDT sees. What could be causing that? A corrupted last_rcvd file on the MDT?&lt;/p&gt;

</comment>
                            <comment id="73818" author="pjones" created="Wed, 18 Dec 2013 22:31:14 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="73823" author="niu" created="Thu, 19 Dec 2013 04:09:26 +0000"  >&lt;p&gt;The LAST_ID on OST is much larger than the last_id on MDT, because the difference is even greater than the OST_MAX_PRECREATE (20000), the orhpan cleanup wasn&apos;t triggered on ost5.&lt;/p&gt;

&lt;p&gt;I&apos;m not sure which LAST_ID is corrupted, was there any abnormal message on ost5 or MDT before this happened? That probably can give us some clue on how this happened.&lt;/p&gt;

&lt;p&gt;What&apos;s the id of last physical object on ost5? Could you have a check?&lt;/p&gt;</comment>
                            <comment id="73872" author="kitwestneat" created="Thu, 19 Dec 2013 18:15:32 +0000"  >&lt;p&gt;The last physical object ID is 65988995, so it seems like a problem with the MDT. I&apos;ll upload the MDS and OSS logs. Do you think deleting the last_rcvd file on the MDT would fix the issue?&lt;/p&gt;</comment>
                            <comment id="73873" author="kitwestneat" created="Thu, 19 Dec 2013 18:19:59 +0000"  >&lt;p&gt;mds2 is acting as the client in this case.&lt;/p&gt;</comment>
                            <comment id="73905" author="niu" created="Fri, 20 Dec 2013 02:39:44 +0000"  >&lt;p&gt;Is the mds running 2.4? I see some 2.1 symbols in the mds log.&lt;/p&gt;

&lt;p&gt;I didn&apos;t see how this happened from the log, and yes, deleting the lov_objids on MDS or just changing the bad value in lov_objids may fix the problem.&lt;/p&gt;</comment>
                            <comment id="73945" author="kitwestneat" created="Fri, 20 Dec 2013 17:00:23 +0000"  >&lt;p&gt;I think it was upgraded Dec 9, so it&apos;s possible that before then there are 2.1.x symbols. &lt;/p&gt;

&lt;p&gt;Do you know why it would keep deleting the same orphan objects? It seems like 65766910 is deleted at least 8 times.&lt;/p&gt;
</comment>
                            <comment id="74006" author="niu" created="Mon, 23 Dec 2013 02:24:25 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Do you know why it would keep deleting the same orphan objects? It seems like 65766910 is deleted at least 8 times.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;You mean deleting orphan on ost4?&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Dec  9 15:04:30 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767015
...
Dec  9 15:11:41 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767015
...
Dec 10 09:58:31 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767047
...
Dec 10 16:20:25 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767079
...
Dec 10 16:33:00 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767111
...
Dec 11 15:54:57 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767143
...
Dec 11 16:50:03 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767175
...
Dec 12 15:01:02 oss1 kernel: : Lustre: HC3WORK-OST0004: deleting orphan objects from 0x0:65766910 to 0x0:65767207
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I think it probably there wasn&apos;t any file striped on ost4 was created, so orphan cleanup always starts from 65766910 (all beyond 65766910 are precreated objects).&lt;/p&gt;</comment>
                            <comment id="74033" author="kitwestneat" created="Mon, 23 Dec 2013 15:52:11 +0000"  >&lt;p&gt;Hmm, I think that is unlikely, as the filesystem is in production. I also don&apos;t understand why it would keep removing 65766910. Does it not keep track of orphans already deleted?&lt;/p&gt;</comment>
                            <comment id="74054" author="niu" created="Tue, 24 Dec 2013 02:00:22 +0000"  >&lt;p&gt;I mean no files striped on ost4 was created during Dec 9 ~ Dec 12, is it possible? The deleted orhpan will be created by precreation, that&apos;s why it has to delete orhpan on each reboot.&lt;/p&gt;</comment>
                            <comment id="74670" author="green" created="Thu, 9 Jan 2014 18:12:00 +0000"  >&lt;p&gt;Kit, can you please tell us if Niu&apos;s idea might be right that ost4 did not see any create activity on it?&lt;/p&gt;</comment>
                            <comment id="75098" author="kitwestneat" created="Thu, 16 Jan 2014 17:11:21 +0000"  >&lt;p&gt;It&apos;s possible, but I think unlikely. The filesystem was in use then. &lt;/p&gt;

&lt;p&gt;We were able to remove the last_rcvd file from the MDT, but that didn&apos;t work. I just reread your previous comment and realized that you were talking about the lov_objids file, oops. I&apos;ll try to get another downtime to remove the lov_objids file. &lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Kit&lt;/p&gt;</comment>
                            <comment id="75849" author="kitwestneat" created="Wed, 29 Jan 2014 15:01:49 +0000"  >&lt;p&gt;Removing the lov_objids file seemed to fix the problem, so I think this ticket can be closed. &lt;/p&gt;</comment>
                            <comment id="75850" author="pjones" created="Wed, 29 Jan 2014 15:05:14 +0000"  >&lt;p&gt;ok thanks Kit!&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="13941" name="kern-mds1" size="2626785" author="kitwestneat" created="Thu, 19 Dec 2013 18:19:59 +0000"/>
                            <attachment id="13940" name="kern-mds2" size="2913206" author="kitwestneat" created="Thu, 19 Dec 2013 18:19:59 +0000"/>
                            <attachment id="13942" name="kern-oss1" size="939511" author="kitwestneat" created="Thu, 19 Dec 2013 18:19:59 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwblb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>12059</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>