<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:29:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9825] Multiple errors on OST/MDS </title>
                <link>https://jira.whamcloud.com/browse/LU-9825</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Running soak on &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7899&quot; title=&quot;osd_xattr_set() to batch actual EA update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7899&quot;&gt;&lt;del&gt;LU-7899&lt;/del&gt;&lt;/a&gt; patch, seeing multiple repeated errors not tied to any system halts. &lt;br/&gt;
First error, on OST, appeared after an OST failover, during LFSCK run. (LFSCK exceeded timeout and was aborted)&lt;br/&gt;
There is a hiccup in network and simul job dies:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;sys-recov.today:/scratch/logs/syslog/soak-6.log:Aug  3 22:14:38 soak-6 kernel: Lustre: soaked-OST0016: Recovery over after 0:05, of 35 clients 35 recovered and 0 were evicted.
sys-recov.today:/scratch/logs/syslog/soak-6.log:Aug  3 22:14:49 soak-6 kernel: Lustre: soaked-OST0004: Recovery over after 0:06, of 35 clients 35 recovered and 0 were evicted.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;/scratch/logs/syslog/soak-5.log:Aug  3 22:17:08 soak-5 kernel: LustreError: 24771:0:(osd_index.c:224:__osd_xattr_load_by_oid()) Skipped 13 previous similar messages
/scratch/logs/syslog/soak-5.log:Aug  3 22:17:08 soak-5 kernel: LustreError: 24771:0:(osd_index.c:224:__osd_xattr_load_by_oid()) soaked-OST0003: can&apos;t get bonus, rc = -17
/scratch/logs/syslog/soak-2.log:Aug  3 22:18:07 soak-2 kernel: LustreError: 24114:0:(osd_index.c:224:__osd_xattr_load_by_oid()) Skipped 534 previous similar messages
/scratch/logs/syslog/soak-2.log:Aug  3 22:18:07 soak-2 kernel: LustreError: 24114:0:(osd_index.c:224:__osd_xattr_load_by_oid()) soaked-OST0012: can&apos;t get bonus, rc = -17
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Second set of errors comes up on MDS&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;scratch/logs/syslog/soak-8.log:Aug  3 22:13:46 soak-8 kernel: LustreError: 4491:0:(client.c:3006:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff8807d76d5700 x1574746919153872/t167504869448(167504869448) o6-&amp;gt;soaked-OST0016-osc-MDT0000@192.168.1.107@o2ib:28/4 lens 664/400 e 0 to 0 dl 1501798434 ref 2 fl Interpret:R/4/0 rc -2/-2
/scratch/logs/syslog/soak-8.log:Aug  3 22:13:50 soak-8 kernel: LustreError: 4491:0:(client.c:3006:ptlrpc_replay_interpret()) @@@ status -2, old was 0  req@ffff8807f37a1500 x1574746919157120/t167504861127(167504861127) o6-&amp;gt;soaked-OST0004-osc-MDT0000@192.168.1.107@o2ib:28/4 lens 664/400 e 0 to 0 dl 1501798438 ref 2 fl Interpret:R/4/0 rc -2/-2
/scratch/logs/syslog/soak-8.log:Aug  3 22:13:56 soak-8 kernel: LustreError: 4939:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 1 previous similar message
/scratch/logs/syslog/soak-8.log:Aug  3 22:13:56 soak-8 kernel: LustreError: 4939:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 968 actual 344.
/scratch/logs/syslog/soak-8.log:Aug  3 22:14:11 soak-8 kernel: LustreError: 11-0: soaked-OST0016-osc-MDT0000: operation ost_destroy to node 192.168.1.107@o2ib failed: rc = -107
/scratch/logs/syslog/soak-8.log:Aug  3 22:14:33 soak-8 kernel: LustreError: 11-0: soaked-OST0004-osc-MDT0000: operation ost_create to node 192.168.1.107@o2ib failed: rc = -107
/scratch/logs/syslog/soak-8.log:Aug  3 22:14:33 soak-8 kernel: LustreError: 4596:0:(osp_precreate.c:619:osp_precreate_send()) soaked-OST0004-osc-MDT0000: can&apos;t precreate: rc = -107
/scratch/logs/syslog/soak-8.log:Aug  3 22:14:40 soak-8 kernel: LustreError: 11-0: soaked-OST000a-osc-MDT0000: operation ost_create to node 192.168.1.107@o2ib failed: rc = -107
/scratch/logs/syslog/soak-8.log:Aug  3 22:14:40 soak-8 kernel: LustreError: 4609:0:(osp_precreate.c:619:osp_precreate_send()) soaked-OST000a-osc-MDT0000: can&apos;t precreate: rc = -107
/scratch/logs/syslog/soak-8.log:Aug  3 22:14:54 soak-8 kernel: LustreError: 11-0: soaked-OST0010-osc-MDT0000: operation ost_statfs to node 192.168.1.107@o2ib failed: rc = -107
/scratch/logs/syslog/soak-8.log:Aug  3 22:22:14 soak-8 kernel: LustreError: 4692:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 23 previous similar messages
/scratch/logs/syslog/soak-8.log:Aug  3 22:22:14 soak-8 kernel: LustreError: 4692:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 416 actual 344.
/scratch/logs/syslog/soak-8.log:Aug  3 22:22:15 soak-8 kernel: LustreError: 4853:0:(mdt_lvb.c:163:mdt_lvbo_fill()) Skipped 25 previous similar messages
/scratch/logs/syslog/soak-8.log:Aug  3 22:22:15 soak-8 kernel: LustreError: 4853:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 968 actual 416.
/scratch/logs/syslog/soak-8.log:Aug  3 22:22:28 soak-8 kernel: LustreError: 4769:0:(mdt_lvb.c:163:mdt_lvbo_fill()) soaked-MDT0000: expected 968 actual 344.
/scratch/logs/syslog/soak-8.log:Aug  3 22:25:20 soak-8 kernel: LustreError: 5065:0:(osp_object.c:582:osp_attr_get()) soaked-OST000b-osc-MDT0000:osp_attr_get update error [0x1000b0000:0x111cd18:0x0]: rc = -4
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Servers are not dying, but this is new with the patch, so possibly of interest. &lt;/p&gt;</description>
                <environment>Soak cluster - testing &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7899&quot; title=&quot;osd_xattr_set() to batch actual EA update&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7899&quot;&gt;&lt;strike&gt;LU-7899&lt;/strike&gt;&lt;/a&gt; patch</environment>
        <key id="47646">LU-9825</key>
            <summary>Multiple errors on OST/MDS </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="cliffw">Cliff White</reporter>
                        <labels>
                    </labels>
                <created>Thu, 3 Aug 2017 22:55:46 +0000</created>
                <updated>Mon, 2 Oct 2017 17:42:57 +0000</updated>
                                            <version>Lustre 2.10.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="204424" author="bzzz" created="Fri, 4 Aug 2017 12:17:07 +0000"  >&lt;p&gt;I can reproduce locally messages like the following:&lt;br/&gt;
/scratch/logs/syslog/soak-5.log:Aug 3 22:17:08 soak-5 kernel: LustreError: 24771:0:(osd_index.c:224:__osd_xattr_load_by_oid()) Skipped 13 previous similar messages&lt;br/&gt;
this is OI scrubber meeting objects being destroyed. &lt;br/&gt;
in my local testing this don&apos;t lead to any test issues.&lt;/p&gt;</comment>
                            <comment id="204718" author="adilger" created="Mon, 7 Aug 2017 22:04:55 +0000"  >&lt;p&gt;The unexpected errors on the MDS are confusing for users, especially errors that appear during normal usage and do not indicate any kind of problem. &lt;/p&gt;

&lt;p&gt;Lai, the &quot;mdt_lvbo_fill() expected N got M&quot; messages are from PFL and should be quieted (2.11 and 2.10.1) since they are expected for composite files. &lt;/p&gt;

&lt;p&gt;Nasf, _osd_xattr_load_by_oid() are from LFSCK and should be investigated. I don&apos;t know if they indicate a problem or not. &lt;/p&gt;</comment>
                            <comment id="204735" author="yong.fan" created="Tue, 8 Aug 2017 02:17:07 +0000"  >&lt;p&gt;There are two callers in current osd-zfs for _osd_xattr_load_by_oid(): one is osd_zfs_otable_it_next(), the other is osd_get_fid_by_oid(). The former one is LFSCK related. I am not sure which one triggered the failure in test. But since LFSCK was triggered during the test, it is possible that the LFSCK called the _osd_xattr_load_by_oid() and caused the failure message. I checked the ZFS code, one suspected point is that:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;int
dnode_hold_impl(objset_t *os, uint64_t object, int flag, int slots,
    void *tag, dnode_t **dnp)
{
...
        mutex_enter(&amp;amp;dn-&amp;gt;dn_mtx);
        type = dn-&amp;gt;dn_type;
        if (dn-&amp;gt;dn_free_txg ||
            ((flag &amp;amp; DNODE_MUST_BE_FREE) &amp;amp;&amp;amp; !refcount_is_zero(&amp;amp;dn-&amp;gt;dn_holds))) {
                mutex_exit(&amp;amp;dn-&amp;gt;dn_mtx);
                zrl_remove(&amp;amp;dnh-&amp;gt;dnh_zrlock);
                dbuf_rele(db, FTAG);
                return (type == DMU_OT_NONE ? ENOENT : EEXIST);
        }
...
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It seems that the object (that we are loading xattr) is removed. From current osd-zfs implementation, we do NOT hold reference on the object, instead, we hope to load the xattr via the oid directly. So that is normal race case.&lt;/p&gt;

&lt;p&gt;On the other hand, our zfs otable based iteration osd_zfs_otable_it_next() will ignore __osd_xattr_load_by_oid() failure, means that if it is the LFSCK triggered the __osd_xattr_load_by_oid() failure message, it is just some warning information, will NOT affect LFSCK running.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;static int osd_zfs_otable_it_next(const struct lu_env *env, struct dt_it *di)
{
...
                rc = __osd_xattr_load_by_oid(dev, it-&amp;gt;mit_pos, &amp;amp;nvbuf);
                if (unlikely(rc != 0))
                        continue;
...
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="204741" author="bzzz" created="Tue, 8 Aug 2017 03:55:59 +0000"  >&lt;p&gt;there is no point to hold a reference to object in this case, especially given this would be the same call to dmu_bonus_hold()..&lt;br/&gt;
I agree we should avoid error messages in -ENOENT case at least - it&apos;s an object &lt;em&gt;being&lt;/em&gt; removed.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="48550">LU-10055</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzhqf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>