<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:48:35 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11976] req wrong generation leading to I/O errors on 2.12 clients</title>
                <link>https://jira.whamcloud.com/browse/LU-11976</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Since we upgraded our clients to 2.12.0, our users are reporting more I/O errors on Oak (2.10 servers) that seem to be related to the following Lustre Error messages:&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Feb 18 19:25:59 sh-106-64.int kernel: LustreError: 397481:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c67bf646000 x1624748797937520/t0(0) o101-&amp;gt;oak-OST005f-osc-ffff8c809690880
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;NAMD job failing with I/O error:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Info: Working in the current directory /oak/....../aqueous2opls/run2
...
FATAL ERROR: Error on write to binary file step6.6_equilibration.restart.vel: Input/output error
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Timestamp of NAMD file: Feb 18 19:25 cluster6.out&lt;/p&gt;

&lt;p&gt;The Lustre client shows a lot of these error messages, on different OSTs. This is all Oak related logs on a client (sh-106-64) that has generated I/O errors:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Feb 06 11:23:22 sh-106-64.int kernel: Lustre: Mounted oak-client
Feb 07 12:49:28 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1549572562/real 1549572562]  req@ffff8c6591f3e300 x16247483-
Feb 10 08:43:24 sh-106-64.int kernel: LustreError: 1287:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c66aaa35400 x1624748322417600/t0(0) o101-&amp;gt;oak-OST006d-osc-ffff8c8096908800@
Feb 13 10:55:06 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550084100/real 1550084100]  req@ffff8c7204ebe900 x16247484-
Feb 15 07:46:03 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550245557/real 1550245557]  req@ffff8c7fc423a400 x16247484-
Feb 15 09:17:12 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550251026/real 1550251026]  req@ffff8c7fc423b600 x16247484-
Feb 15 09:21:23 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550251277/real 1550251277]  req@ffff8c7fc4238f00 x16247484-
Feb 15 10:09:28 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550254162/real 1550254162]  req@ffff8c7fc423b000 x16247484-
Feb 15 10:22:01 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550254915/real 1550254915]  req@ffff8c7fc423bc00 x16247484-
Feb 15 13:28:05 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266079/real 1550266079]  req@ffff8c7fc4238c00 x16247484-
Feb 15 13:31:26 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266280/real 1550266280]  req@ffff8c7fc423ce00 x16247484-
Feb 15 13:38:07 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550266681/real 1550266681]  req@ffff8c7fc423e300 x16247484-
Feb 15 13:44:24 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550267058/real 1550267058]  req@ffff8c7fc423ad00 x16247484-
Feb 16 18:12:47 sh-106-64.int kernel: LustreError: 11-0: oak-OST0071-osc-ffff8c8096908800: operation ost_connect to node 10.0.2.109@o2ib5 failed: rc = -19
Feb 17 00:40:54 sh-106-64.int kernel: Lustre: 98431:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1550392848/real 1550392848]  req@ffff8c8020528300 x16247484-
Feb 18 19:25:59 sh-106-64.int kernel: LustreError: 397481:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c67bf646000 x1624748797937520/t0(0) o101-&amp;gt;oak-OST005f-osc-ffff8c809690880
Feb 19 10:55:50 sh-106-64.int kernel: LustreError: 39073:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c77a92f2400 x1624748825028640/t0(0) o101-&amp;gt;oak-OST004f-osc-ffff8c8096908800
Feb 19 11:46:27 sh-106-64.int kernel: LustreError: 39926:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c8041fada00 x1624748825812992/t0(0) o101-&amp;gt;oak-OST0033-osc-ffff8c8096908800
Feb 19 14:14:03 sh-106-64.int kernel: LustreError: 48889:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c69c2e1bc00 x1624748827483728/t0(0) o101-&amp;gt;oak-OST0066-osc-ffff8c8096908800
Feb 19 14:36:38 sh-106-64.int kernel: LustreError: 56018:0:(client.c:1193:ptlrpc_import_delay_req()) @@@ req wrong generation:  req@ffff8c67d265e900 x1624748827749392/t0(0) o101-&amp;gt;oak-OST0071-osc-ffff8c8096908800
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Any idea of how to troubleshoot this issue? Perhaps this is a 2.10/2.12 compat issue?&lt;br/&gt;
 Thanks,&lt;br/&gt;
 Stephane&lt;/p&gt;</description>
                <environment>Clients:2.12.0 Servers (Oak): 2.10.5</environment>
        <key id="54918">LU-11976</key>
            <summary>req wrong generation leading to I/O errors on 2.12 clients</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="sthiell">Stephane Thiell</reporter>
                        <labels>
                    </labels>
                <created>Tue, 19 Feb 2019 23:25:51 +0000</created>
                <updated>Tue, 19 Dec 2023 14:28:51 +0000</updated>
                            <resolved>Sat, 16 Mar 2019 08:48:38 +0000</resolved>
                                    <version>Lustre 2.12.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="242316" author="bzzz" created="Wed, 20 Feb 2019 08:19:00 +0000"  >&lt;p&gt;looks like a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11951&quot; title=&quot;sanity: test_231a failure, idle disconnect &amp;amp; import generation disagreement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11951&quot;&gt;&lt;del&gt;LU-11951&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="242317" author="bzzz" created="Wed, 20 Feb 2019 08:24:50 +0000"  >&lt;p&gt;please try to disable idling connection feature on the clients:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
lctl set_param  osc.*.idle_timeout=0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="242361" author="sthiell" created="Wed, 20 Feb 2019 17:33:40 +0000"  >&lt;p&gt;Thanks Alex! We have disabled the idling connection timeout feature on all clients on our pre-2.12 filesystems (Oak based on lustre 2.10 servers and Regal based on lustre 2.8 servers). We decided to keep it enabled with our new 2.12 filesystem (fir) until we see the same &quot;req wrong generation&quot; issues.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;$ lctl get_param osc.*.idle_timeout
osc.fir-OST0000-osc-ffff92c50baf0800.idle_timeout=20
...
osc.fir-OST002f-osc-ffff92c50baf0800.idle_timeout=20
osc.oak-OST0000-osc-ffff92c50b2d3000.idle_timeout=0
...
osc.oak-OST0071-osc-ffff92c50b2d3000.idle_timeout=0
osc.regal-OST0000-osc-ffff92c50b2d1800.idle_timeout=0
...
osc.regal-OST006b-osc-ffff92c50b2d1800.idle_timeout=0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="242369" author="sthiell" created="Wed, 20 Feb 2019 19:14:53 +0000"  >&lt;p&gt;Actually we also found many occurrences of &quot;req wrong generation&quot; errors with fir, so idle_timeout is now set to 0 on all filesystems.&lt;/p&gt;</comment>
                            <comment id="242530" author="sthiell" created="Fri, 22 Feb 2019 17:22:14 +0000"  >&lt;p&gt;Hi Alex,&lt;/p&gt;

&lt;p&gt;I confirm that after we disabled the&#160;idling connection feature (was 2 days ago), we are not seeing any more occurrence of these &quot;req wrong generation errors&quot; on Sherlock. Thanks!&lt;/p&gt;</comment>
                            <comment id="244064" author="adilger" created="Sat, 16 Mar 2019 08:48:38 +0000"  >&lt;p&gt;Close as a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11951&quot; title=&quot;sanity: test_231a failure, idle disconnect &amp;amp; import generation disagreement&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11951&quot;&gt;&lt;del&gt;LU-11951&lt;/del&gt;&lt;/a&gt;, which has a patch.  I&apos;ve cherry-picked it to b2_12 as &lt;a href=&quot;https://review.whamcloud.com/34435&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34435&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="54837">LU-11951</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="54869">LU-11964</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00bun:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>