<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:08:23 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7378] Error destroying object with RC115</title>
                <link>https://jira.whamcloud.com/browse/LU-7378</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We are encountering an issue on our lustre-2.5.4+ servers where we are seeing an error destroying objects with RC115, then the OI scrubber is starting. We suspect that the OI scrubber may be causing some jobs to run over wallclock. &lt;/p&gt;

&lt;p&gt;I have attached some kernel logs, and below is the output from the OI scrubber:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@f1-oss1d8 f1-OST00bf&amp;#93;&lt;/span&gt;# cat /proc/fs/lustre/osd-ldiskfs/f1-OST00bf/oi_scrub&lt;br/&gt;
name: OI_scrub&lt;br/&gt;
magic: 0x4c5fd252&lt;br/&gt;
oi_files: 64&lt;br/&gt;
status: scanning&lt;br/&gt;
flags: inconsistent,auto&lt;br/&gt;
param:&lt;br/&gt;
time_since_last_completed: 316 seconds&lt;br/&gt;
time_since_latest_start: 17 seconds&lt;br/&gt;
time_since_last_checkpoint: 17 seconds&lt;br/&gt;
latest_start_position: 12&lt;br/&gt;
last_checkpoint_position: N/A&lt;br/&gt;
first_failure_position: 562883&lt;br/&gt;
checked: 1900449&lt;br/&gt;
updated: 0&lt;br/&gt;
failed: 8&lt;br/&gt;
prior_updated: 0&lt;br/&gt;
noscrub: 119&lt;br/&gt;
igif: 1&lt;br/&gt;
success_count: 13873&lt;br/&gt;
run_time: 18 seconds&lt;br/&gt;
average_speed: 105580 objects/sec&lt;br/&gt;
real-time_speed: 107345 objects/sec&lt;br/&gt;
current_position: 2082404&lt;br/&gt;
lf_scanned: 0&lt;br/&gt;
lf_reparied: 0&lt;br/&gt;
lf_failed: 0&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@f1-oss1d8 f1-OST00bf&amp;#93;&lt;/span&gt;#&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Dustin &lt;/p&gt;</description>
                <environment></environment>
        <key id="32985">LU-7378</key>
            <summary>Error destroying object with RC115</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="yong.fan">nasf</assignee>
                                    <reporter username="dustb100">Dustin Leverman</reporter>
                        <labels>
                    </labels>
                <created>Tue, 3 Nov 2015 20:05:18 +0000</created>
                <updated>Wed, 16 Mar 2016 14:56:24 +0000</updated>
                            <resolved>Thu, 3 Dec 2015 23:42:22 +0000</resolved>
                                    <version>Lustre 2.5.3</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>9</watches>
                                                                            <comments>
                            <comment id="132653" author="yujian" created="Wed, 4 Nov 2015 19:19:51 +0000"  >&lt;p&gt;In kernel logs, the following error messages kept occurring:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Oct 14 16:30:12 f1-oss1d8 kernel: [2604418.085966] LustreError: 10030:0:(ost_handler.c:1777:ost_blocking_ast()) Error -2 syncing data on lock cancel
Oct 14 16:33:07 f1-oss1d8 kernel: [2604593.041236] Lustre: f1-OST00bf-osd: trigger OI scrub by RPC for [0x100000000:0x33d61e5:0x0], rc = 0 [1]
Oct 14 16:33:07 f1-oss1d8 kernel: [2604593.065422] Lustre: Skipped 3 previous similar messages
Oct 14 16:33:07 f1-oss1d8 kernel: [2604593.076384] LustreError: 99512:0:(ofd_obd.c:1096:ofd_destroy()) f1-OST00bf: error destroying object [0x100000000:0x33d61e5:0x0]: -115
Oct 14 16:33:07 f1-oss1d8 kernel: [2604593.125308] LustreError: 99512:0:(ofd_obd.c:1096:ofd_destroy()) Skipped 11 previous similar messages
Oct 14 16:33:11 f1-oss1d8 kernel: [2604597.299596] LustreError: 38769:0:(osd_compat.c:598:osd_obj_update_entry()) f1-OST00bf-osd: the FID [0x100000000:0x2fef612:0x0] is used by two objects: 6037283/2343154970 562883/2343154970
Oct 14 16:33:11 f1-oss1d8 kernel: [2604597.348347] LustreError: 38769:0:(osd_compat.c:598:osd_obj_update_entry()) Skipped 15 previous similar messages
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For &quot;-115&quot; (-EINPROGRESS) error, it came from ofd_destroy -&amp;gt; ofd_destroy_by_fid -&amp;gt; ofd_object_find -&amp;gt; &#8230; -&amp;gt; osd_object_init -&amp;gt; osd_fid_lookup():&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;                        triggered = &lt;span class=&quot;code-keyword&quot;&gt;true&lt;/span&gt;;
                        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (thread_is_running(&amp;amp;scrub-&amp;gt;os_thread)) {
                                result = -EINPROGRESS;
                        } &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (!dev-&amp;gt;od_noscrub) {
                                result = osd_scrub_start(dev);
                                LCONSOLE_WARN(&lt;span class=&quot;code-quote&quot;&gt;&quot;%.16s: trigger OI scrub by RPC &quot;&lt;/span&gt;
                                              &lt;span class=&quot;code-quote&quot;&gt;&quot;&lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; &quot;&lt;/span&gt;DFID&lt;span class=&quot;code-quote&quot;&gt;&quot;, rc = %d [1]\n&quot;&lt;/span&gt;,
                                              osd_name(dev), PFID(fid), result);
                                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (result == 0 || result == -EALREADY)
                                        result = -EINPROGRESS;
                                &lt;span class=&quot;code-keyword&quot;&gt;else&lt;/span&gt;
                                        result = -EREMCHG;
                        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For &quot;FID used by two objects&quot; error, it came from osd_obj_update_entry():&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (lu_fid_eq(fid, oi_fid)) {
                CERROR(&lt;span class=&quot;code-quote&quot;&gt;&quot;%s: the FID &quot;&lt;/span&gt;DFID&lt;span class=&quot;code-quote&quot;&gt;&quot; is used by two objects: &quot;&lt;/span&gt;
                       &lt;span class=&quot;code-quote&quot;&gt;&quot;%u/%u %u/%u\n&quot;&lt;/span&gt;, osd_name(osd), PFID(fid),
                       oi_id-&amp;gt;oii_ino, oi_id-&amp;gt;oii_gen,
                       id-&amp;gt;oii_ino, id-&amp;gt;oii_gen);
                GOTO(out, rc = -EEXIST);
        }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="132671" author="yujian" created="Wed, 4 Nov 2015 21:21:10 +0000"  >&lt;p&gt;Hi Dustin,&lt;/p&gt;

&lt;p&gt;Could you please run the following command on MDT device to see the pathnames of the two inodes that used the same FID?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;debugfs -c -R &quot;ncheck 6037283 562883&quot; /dev/{mdtdev}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="132681" author="ezell" created="Wed, 4 Nov 2015 22:58:16 +0000"  >&lt;p&gt;Hi Jian-&lt;/p&gt;

&lt;p&gt;I think those are OST inode numbers, so I ran the command there:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@f1-oss1d8 ~]# debugfs -c -R &quot;ncheck 6037283 562883&quot; /dev/mapper/f1-ddn1d-l53
debugfs 1.42.12.wc1 (15-Sep-2014)
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps
Inode   Pathname
562883  /O/0/d4/54354404
6037283 /O/0/d18/50263570
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@f1-oss1d8 ~]# debugfs -c -R &quot;stat /O/0/d4/54354404&quot; /dev/mapper/f1-ddn1d-l53 | grep fid
debugfs 1.42.12.wc1 (15-Sep-2014)
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps
  lma: fid=[0x100000000:0x2fef612:0x0] compat=8 incompat=0
  fid = &quot;a8 6e 00 bc f6 02 00 02 00 00 00 00 02 00 00 00 &quot; (16)
  fid: parent=[0x20002f6bc006ea8:0x0:0x0] stripe=2
[root@f1-oss1d8 ~]# debugfs -c -R &quot;stat /O/0/d18/50263570&quot; /dev/mapper/f1-ddn1d-l53 | grep fid
debugfs 1.42.12.wc1 (15-Sep-2014)
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps
  lma: fid=[0x100000000:0x2fef612:0x0] compat=8 incompat=0
  fid = &quot;a8 6e 00 bc f6 02 00 02 00 00 00 00 02 00 00 00 &quot; (16)
  fid: parent=[0x20002f6bc006ea8:0x0:0x0] stripe=2
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That&apos;s where I hit a dead end:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@lfs-mgmt01.ncrc.gov ~]# lfs fid2path /lustre/f1 &apos;[0x100000000:0x2fef612:0x0]&apos;
ioctl err -22: Invalid argument (22)
fid2path: error on FID [0x100000000:0x2fef612:0x0]: Invalid argument
[root@lfs-mgmt01.ncrc.gov ~]# lfs fid2path /lustre/f1 &apos;[0x20002f6bc006ea8:0x0:0x0]&apos;
ioctl err -22: Invalid argument (22)
fid2path: error on FID [0x20002f6bc006ea8:0x0:0x0]: Invalid argument
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="132690" author="yujian" created="Wed, 4 Nov 2015 23:38:02 +0000"  >&lt;p&gt;Hi Matt,&lt;/p&gt;

&lt;p&gt;Could you please refer to the steps in &lt;a href=&quot;https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.html#dbdoclet.50438194_30872&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.html#dbdoclet.50438194_30872&lt;/a&gt; ?&lt;/p&gt;</comment>
                            <comment id="132692" author="ezell" created="Wed, 4 Nov 2015 23:59:49 +0000"  >&lt;p&gt;Hi Jian-&lt;/p&gt;

&lt;p&gt;This file system was formatted with Lustre 2.4, so we shouldn&apos;t have any IGIF files.  According to step 2, &lt;b&gt;fid2path&lt;/b&gt; should have been able to find it, correct?&lt;/p&gt;

&lt;p&gt;~Matt&lt;/p&gt;</comment>
                            <comment id="132697" author="yujian" created="Thu, 5 Nov 2015 01:25:22 +0000"  >&lt;p&gt;Yes, Matt. Let me ask for help.&lt;/p&gt;

&lt;p&gt;Hi Nasf,&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;</comment>
                            <comment id="132715" author="yong.fan" created="Thu, 5 Nov 2015 07:54:58 +0000"  >&lt;blockquote&gt;
&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@lfs-mgmt01.ncrc.gov ~&amp;#93;&lt;/span&gt;# lfs fid2path /lustre/f1 &apos;&lt;span class=&quot;error&quot;&gt;&amp;#91;0x20002f6bc006ea8:0x0:0x0&amp;#93;&lt;/span&gt;&apos;&lt;br/&gt;
ioctl err -22: Invalid argument (22)&lt;br/&gt;
fid2path: error on FID &lt;span class=&quot;error&quot;&gt;&amp;#91;0x20002f6bc006ea8:0x0:0x0&amp;#93;&lt;/span&gt;: Invalid argument&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;If the &quot;/lustre/f1&quot; is the right mount point, then that means the FID &quot;&lt;span class=&quot;error&quot;&gt;&amp;#91;0x20002f6bc006ea8:0x0:0x0&amp;#93;&lt;/span&gt;&quot; is an invalid one. The first valid lu_fid::f_oid for MDT-object&apos;s FID is 1, not 0. So the PFID EA for the OST-object is wrong. So it is normal that you cannot locate related MDT-object via such invalid FID.&lt;/p&gt;

&lt;p&gt;On the other hand, your case happened during destroying OST-object. That means the MDT-object has been removed already. So even thought the PFID EA was right, you still could not locate the removed MDT-object.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@f1-oss1d8 ~&amp;#93;&lt;/span&gt;# debugfs -c -R &quot;stat /O/0/d4/54354404&quot; /dev/mapper/f1-ddn1d-l53 | grep fid&lt;br/&gt;
debugfs 1.42.12.wc1 (15-Sep-2014)&lt;br/&gt;
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps&lt;br/&gt;
  lma: fid=&lt;span class=&quot;error&quot;&gt;&amp;#91;0x100000000:0x2fef612:0x0&amp;#93;&lt;/span&gt; compat=8 incompat=0&lt;br/&gt;
  fid = &quot;a8 6e 00 bc f6 02 00 02 00 00 00 00 02 00 00 00 &quot; (16)&lt;br/&gt;
  fid: parent=&lt;span class=&quot;error&quot;&gt;&amp;#91;0x20002f6bc006ea8:0x0:0x0&amp;#93;&lt;/span&gt; stripe=2&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@f1-oss1d8 ~&amp;#93;&lt;/span&gt;# debugfs -c -R &quot;stat /O/0/d18/50263570&quot; /dev/mapper/f1-ddn1d-l53 | grep fid&lt;br/&gt;
debugfs 1.42.12.wc1 (15-Sep-2014)&lt;br/&gt;
/dev/mapper/f1-ddn1d-l53: catastrophic mode - not reading inode or group bitmaps&lt;br/&gt;
  lma: fid=&lt;span class=&quot;error&quot;&gt;&amp;#91;0x100000000:0x2fef612:0x0&amp;#93;&lt;/span&gt; compat=8 incompat=0&lt;br/&gt;
  fid = &quot;a8 6e 00 bc f6 02 00 02 00 00 00 00 02 00 00 00 &quot; (16)&lt;br/&gt;
  fid: parent=&lt;span class=&quot;error&quot;&gt;&amp;#91;0x20002f6bc006ea8:0x0:0x0&amp;#93;&lt;/span&gt; stripe=2&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Searched the log, only the FID &lt;span class=&quot;error&quot;&gt;&amp;#91;0x100000000:0x2fef612:0x0&amp;#93;&lt;/span&gt; was reported as conflict FID mapping. Such FID should be mapped to /O/0/d18/50263570 (6037283/2343154970). But for some unknown reason, the /O/0/d4/54354404 (562883/2343154970) also claims to be mapped to the FID &lt;span class=&quot;error&quot;&gt;&amp;#91;0x100000000:0x2fef612:0x0&amp;#93;&lt;/span&gt;. The latter one is wrong.&lt;/p&gt;

&lt;p&gt;In fact, the OI entry /O/0/d4/54354404 is corresponding to the FID &lt;span class=&quot;error&quot;&gt;&amp;#91;0x100000000:0x33d61e4:0x0&amp;#93;&lt;/span&gt;. Since we are destroying the OST-object that is corresponding to the FID &lt;span class=&quot;error&quot;&gt;&amp;#91;0x100000000:0x33d61e4:0x0&amp;#93;&lt;/span&gt;. The most simple solution is that mount the OST as ldiskfs mode, then remove /O/0/d4/54354404 directly.&lt;/p&gt;</comment>
                            <comment id="133235" author="dustb100" created="Wed, 11 Nov 2015 14:54:01 +0000"  >&lt;p&gt;Thank you for the information Nasf. This will be our plan, but we need to schedule an outage with the customer. We will let you know the result. &lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Dustin &lt;/p&gt;</comment>
                            <comment id="135079" author="dustb100" created="Thu, 3 Dec 2015 13:31:47 +0000"  >&lt;p&gt;Nasf, &lt;br/&gt;
      We took a downtime this morning and removed the object you identified above. We are now getting new duplicate FID errors in the logs. It looks like 15 messages were being suppressed so we have more objects to destroy. The one that you helped with above is no longer generating messages. We will disable log suppression and identify the objects that need to be destroyed using the procedure we followed above. &lt;/p&gt;

&lt;p&gt;Thank you for your help, I think it is okay to close this ticket. &lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Dustin &lt;/p&gt;</comment>
                            <comment id="135177" author="yujian" created="Thu, 3 Dec 2015 23:42:22 +0000"  >&lt;p&gt;Thank you, Dustin. Let me close this ticket.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="35290">LU-7867</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="19515" name="lustrekernel-20151014" size="132472" author="dustb100" created="Tue, 3 Nov 2015 20:05:18 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxs5r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>