<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:27:31 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16497] various lustre errors on clients and servers</title>
                <link>https://jira.whamcloud.com/browse/LU-16497</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We&apos;re seeing quite a few errors on clients, OSSes and the MDS. &lt;/p&gt;

&lt;p&gt;For example on clients:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Jan 16 09:53:09 juliet2 kernel: LustreError: 49499:0:(mdc_request.c:1441:mdc_read_page()) juliet-MDT0000-mdc-ffff99a3723aa800: [0x200001b3e:0x5f66:0x0] lock enqueue fails: rc = -4
Jan 16 21:30:41 juliet2 kernel: LustreError: 11-0: juliet-OST002a-osc-ffff99a3723aa800: operation ldlm_enqueue to node 10.29.22.93@tcp failed: rc = -107
Jan 16 21:30:41 juliet2 kernel: Lustre: juliet-OST002a-osc-ffff99a3723aa800: Connection to juliet-OST002a (at 10.29.22.93@tcp) was lost; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; recovery to complete
Jan 16 21:30:41 juliet2 kernel: LustreError: 167-0: juliet-OST002a-osc-ffff99a3723aa800: This client was evicted by juliet-OST002a; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will fail.
Jan 16 21:30:41 juliet2 kernel: Lustre: 4193:0:(llite_lib.c:2762:ll_dirty_page_discard_warn()) juliet: dirty page discard: 10.29.22.90@tcp:/juliet/fid: [0x20002dd8a:0x16daa:0x0]/ may get corrupted (rc -108)
Jan 16 21:30:41 juliet2 kernel: Lustre: 4191:0:(llite_lib.c:2762:ll_dirty_page_discard_warn()) juliet: dirty page discard: 10.29.22.90@tcp:/juliet/fid: [0x20002dd8a:0x16cb1:0x0]/ may get corrupted (rc -108)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;OSS:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Jan 16 06:17:54 joss1 kernel: LustreError: 6496:0:(events.c:455:server_bulk_callback()) event type 3, status -5, desc ffff92ef5dbb3000
Jan 16 06:17:54 joss1 kernel: LustreError: 16260:0:(ldlm_lib.c:3363:target_bulk_io()) @@@ network error on bulk WRITE  req@ffff92f3dfcbb850 x1760556572171776/t0(0) o4-&amp;gt;bd9b8fe9-b80f-7114-7b35-663a8e9d48db@10.29.22.97@tcp:446/0 lens 488/448 e 0 to 0 dl 1673867911 ref 1 fl Interpret:/0/0 rc 0/0
Jan 16 06:17:54 joss1 kernel: Lustre: juliet-OST0009: Client bd9b8fe9-b80f-7114-7b35-663a8e9d48db (at 10.29.22.97@tcp) reconnecting
Jan 16 06:17:54 joss1 kernel: Lustre: juliet-OST0009: Connection restored to 3d01cce1-cfce-5103-0db6-32c1aa8f728c (at 10.29.22.97@tcp)
Jan 16 06:17:54 joss1 kernel: Lustre: juliet-OST0009: Bulk IO write error with bd9b8fe9-b80f-7114-7b35-663a8e9d48db (at 10.29.22.97@tcp), client will retry: rc = -110
Jan 16 06:17:54 joss1 kernel: Lustre: Skipped 1 previous similar message
Jan 16 06:17:54 joss1 kernel: LustreError: 16218:0:(ldlm_lib.c:3357:target_bulk_io()) @@@ Reconnect on bulk WRITE  req@ffff92eb76e54050 x1760556572184448/t0(0) o4-&amp;gt;bd9b8fe9-b80f-7114-7b35-663a8e9d48db@10.29.22.97@tcp:452/0 lens 488/448 e 0 to 0 dl 1673867917 ref 1 fl Interpret:/0/0 rc 0/0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;MDS:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Jan 16 19:52:10 jmds1 kernel: LustreError: 47609:0:(ldlm_lib.c:3357:target_bulk_io()) @@@ Reconnect on bulk READ  req@ffff995a6c544850 x1760579715652736/t0(0) o37-&amp;gt;bd9b8fe9-b80f-7114-7b35-663a8e9d48db@10.29.22.97@tcp:220/0 lens 448/440 e 1 to 0 dl 1673916760 ref 1 fl Interpret:/0/0 rc 0/0
Jan 19 12:11:29 jmds1 kernel: LustreError: 15481:0:(mgs_handler.c:282:mgs_revoke_lock()) MGS: can&apos;t take cfg lock &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 0x736d61726170/0x3 : rc = -11
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Is it possible to give us an idea of what these errors might indicate? e.g., network issues, misconfiguration, load etc, so we can narrow down the focus of investigation. Let us know what extra details (logs, cluster settings) you might need if further information is needed.&lt;/p&gt;</description>
                <environment></environment>
        <key id="74122">LU-16497</key>
            <summary>various lustre errors on clients and servers</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="cfaber">Colin Faber</assignee>
                                    <reporter username="dneg">Dneg</reporter>
                        <labels>
                    </labels>
                <created>Fri, 20 Jan 2023 15:52:38 +0000</created>
                <updated>Tue, 7 Mar 2023 15:01:27 +0000</updated>
                            <resolved>Tue, 7 Mar 2023 15:01:27 +0000</resolved>
                                                    <fixVersion>Lustre 2.15.2</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="359883" author="dneg" created="Fri, 20 Jan 2023 16:29:41 +0000"  >&lt;p&gt;Version of Lustre is 2.12.8_6_g5457c37 on all OSSes, the MDS and clients (apart from one, which is running 2.12.6)&lt;/p&gt;</comment>
                            <comment id="359895" author="JIRAUSER17312" created="Fri, 20 Jan 2023 18:32:33 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;Can you please attach full logs around this incident? Thank you!&lt;/p&gt;

&lt;p&gt;(dmesg / syslogs, etc)&lt;/p&gt;</comment>
                            <comment id="360020" author="dneg" created="Mon, 23 Jan 2023 10:39:54 +0000"  >&lt;p&gt;Logs attached&lt;/p&gt;</comment>
                            <comment id="360088" author="dneg" created="Mon, 23 Jan 2023 18:17:29 +0000"  >&lt;p&gt;A bit more information. This cluster was upgraded to 2.12.8_6_g5457c37, to address issues seen in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15915&quot; title=&quot;/bin/rm: fts_read failed: Cannot send after transport endpoint shutdown&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15915&quot;&gt;&lt;del&gt;LU-15915&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16343&quot; title=&quot;soft lockups ptlrpcd&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16343&quot;&gt;&lt;del&gt;LU-16343&lt;/del&gt;&lt;/a&gt;. As a temporary mitigation, we lowered lru size to 128 from 10000. We set it back to 10000 just recently, and since that happened (around 19th of Jan), evictions have started occurring as well. Logs attached&lt;/p&gt;</comment>
                            <comment id="360092" author="JIRAUSER17312" created="Mon, 23 Jan 2023 18:27:14 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=dneg&quot; class=&quot;user-hover&quot; rel=&quot;dneg&quot;&gt;dneg&lt;/a&gt;&#160;&lt;/p&gt;

&lt;p&gt;Based on what I&apos;m seeing this does look very similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14644&quot; title=&quot;IOR SSF PFL ill-formed I/O job aborted with EIO during automated FOFB testing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14644&quot;&gt;&lt;del&gt;LU-14644&lt;/del&gt;&lt;/a&gt;, this has been addressed in 2.15.0 and 2.12.10. Can you try upgrading and see if you still experience?&lt;/p&gt;</comment>
                            <comment id="360200" author="dneg" created="Tue, 24 Jan 2023 16:44:11 +0000"  >&lt;p&gt;Thanks Colin, looking at the Whamcloud public repos, the &apos;latest-2.12-release&apos; link points to 2.12.9-1. Has this got the required patches, or should we wait for 2.12.10 (and when will that be released?)&lt;/p&gt;

&lt;p&gt;Regards,&lt;/p&gt;

&lt;p&gt;Campbell&lt;/p&gt;</comment>
                            <comment id="360205" author="pjones" created="Tue, 24 Jan 2023 16:53:17 +0000"  >&lt;p&gt;Campbell&lt;/p&gt;

&lt;p&gt;There are no current plans to issue at 2.12.10. What Colin means is that the fix has been merged to the branch so that if we ever did decide to do one, it would include this fix. Using the latest 2.15.x LTS release (2.15.2) would be the most expedient option&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="360218" author="dneg" created="Tue, 24 Jan 2023 17:57:01 +0000"  >&lt;p&gt;Ok, thanks Peter&lt;/p&gt;</comment>
                            <comment id="360484" author="dneg" created="Thu, 26 Jan 2023 14:34:54 +0000"  >&lt;p&gt;Quick question Peter: since we&apos;re upgrading from EL7 to EL8 as well as from 2.12 to 2.15, is there any special steps to take?&lt;/p&gt;</comment>
                            <comment id="360802" author="pjones" created="Mon, 30 Jan 2023 03:43:42 +0000"  >&lt;p&gt;The one tip that I have heard is to be cautious to not inadvertently reformat your OSTs but, basically, this should be a standard OS upgrade. Based on the showing at the Lustre BOF at SC22, it seems like a number of people have already navigated this upgrade so you could always poll lustre-discuss to see if other community members have any experiences to share.&lt;/p&gt;</comment>
                            <comment id="365073" author="JIRAUSER17312" created="Tue, 7 Mar 2023 15:01:27 +0000"  >&lt;p&gt;Resolving as this is fixed in 2.15.2&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="63959">LU-14644</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="47777" name="jmds1-messages-20230115.gz" size="2823303" author="dneg" created="Mon, 23 Jan 2023 10:39:28 +0000"/>
                            <attachment id="47801" name="jmds1-messages.gz" size="448939" author="dneg" created="Mon, 23 Jan 2023 18:17:40 +0000"/>
                            <attachment id="47785" name="joss1-messages-20230115.gz" size="2876873" author="dneg" created="Mon, 23 Jan 2023 10:39:29 +0000"/>
                            <attachment id="47803" name="joss1-messages.gz" size="424949" author="dneg" created="Mon, 23 Jan 2023 18:17:39 +0000"/>
                            <attachment id="47784" name="joss2-messages-20230115.gz" size="2911672" author="dneg" created="Mon, 23 Jan 2023 10:39:31 +0000"/>
                            <attachment id="47804" name="joss2-messages.gz" size="454019" author="dneg" created="Mon, 23 Jan 2023 18:17:40 +0000"/>
                            <attachment id="47783" name="joss3-messages-20230115.gz" size="2877484" author="dneg" created="Mon, 23 Jan 2023 10:39:30 +0000"/>
                            <attachment id="47782" name="joss4-messages-20230115.gz" size="2859549" author="dneg" created="Mon, 23 Jan 2023 10:39:31 +0000"/>
                            <attachment id="47781" name="joss5-messages-20230115.gz" size="2866444" author="dneg" created="Mon, 23 Jan 2023 10:39:30 +0000"/>
                            <attachment id="47780" name="joss6-messages-20230115.gz" size="2890275" author="dneg" created="Mon, 23 Jan 2023 10:39:30 +0000"/>
                            <attachment id="47800" name="juliet1-messages-20230115 (1).gz" size="3304041" author="dneg" created="Mon, 23 Jan 2023 18:17:44 +0000"/>
                            <attachment id="47778" name="juliet1-messages-20230115.gz" size="3304041" author="dneg" created="Mon, 23 Jan 2023 10:39:30 +0000"/>
                            <attachment id="47802" name="juliet1-messages.gz" size="523772" author="dneg" created="Mon, 23 Jan 2023 18:17:40 +0000"/>
                            <attachment id="47799" name="juliet2-messages-20230115 (1).gz" size="1877927" author="dneg" created="Mon, 23 Jan 2023 18:17:42 +0000"/>
                            <attachment id="47779" name="juliet2-messages-20230115.gz" size="1877927" author="dneg" created="Mon, 23 Jan 2023 10:39:29 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03aqf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>