<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:19:46 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1797] MDS deactivates/reactivates OSTs</title>
                <link>https://jira.whamcloud.com/browse/LU-1797</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Running the SWL test suite (mixture of IO jobs spread across all clients) I am seeing this on multiple OSTs.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;Aug 28 05:11:41 ehyperion-dit34 kernel: Lustre: lustre-OST000d: Client lustre-MDT0000-mdtlov_UUID (at 192.168.127.6@o2ib1) reconnecting
Aug 28 05:11:41 ehyperion-dit34 kernel: Lustre: lustre-OST000d: received MDS connection from 192.168.127.6@o2ib1
Aug 28 05:11:41 ehyperion-rst6 kernel: Lustre: 3947:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request  sent has timed out &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; slow reply: [sent 1346155795/real 1346155795]  req@ffff880093a2d000 x1411378033913049/t0(0) o5-&amp;gt;lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1346155901 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1
Aug 28 05:11:41 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection to lustre-OST000d (at 192.168.127.65@o2ib1) was lost; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; recovery to complete
Aug 28 05:11:41 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection restored to lustre-OST000d (at 192.168.127.65@o2ib1)
Aug 28 05:11:41 ehyperion-rst6 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST000d_UUID now active, resetting orphans
Aug 28 05:11:55 ehyperion-rst6 kernel: LustreError: 3947:0:(osc_create.c:169:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc  req@ffff880093a2d000 x1411378033913049/t0(0) o5-&amp;gt;lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1346155901 ref 1 fl Interpret:RXN/0/ffffffff rc -107/-1
Aug 28 05:12:31 ehyperion-rst6 kernel: LustreError: 12495:0:(mds_lov.c:883:__mds_lov_synchronize()) lustre-OST000d_UUID failed at mds_lov_clear_orphans: -5
Aug 28 05:12:31 ehyperion-rst6 kernel: LustreError: 12495:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST000d_UUID sync failed -5, deactivating
Aug 28 05:23:07 ehyperion-rst6 kernel: Lustre: 3948:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request  sent has timed out &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; slow reply: [sent 1346156476/real 1346156476]  req@ffff8801fa613400 x1411378034235068/t0(0) o400-&amp;gt;lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:28/4 lens 224/224 e 0 to 1 dl 1346156587 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1
Aug 28 05:23:07 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection to lustre-OST000d (at 192.168.127.65@o2ib1) was lost; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; recovery to complete
Aug 28 05:23:07 ehyperion-dit34 kernel: Lustre: lustre-OST000d: Client lustre-MDT0000-mdtlov_UUID (at 192.168.127.6@o2ib1) reconnecting
Aug 28 05:23:07 ehyperion-dit34 kernel: Lustre: lustre-OST000d: received MDS connection from 192.168.127.6@o2ib1
Aug 28 05:23:07 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection restored to lustre-OST000d (at 192.168.127.65@o2ib1)
Aug 28 05:23:07 ehyperion-rst6 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST000d_UUID now active, resetting orphans

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;In this case, the OST recovered without intervention. In other cases &apos;lctl --device NN recover&apos; has fixed the problem, resulting msg:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Aug 28 07:56:04 ehyperion-dit33 kernel: Lustre: 7426:0:(llog_net.c:162:llog_receptor_accept()) changing the &lt;span class=&quot;code-keyword&quot;&gt;import&lt;/span&gt; ffff8802dbaaf800 - ffff8802cf6ff000
Aug 28 07:56:04 ehyperion-dit33 kernel: Lustre: 7426:0:(llog_net.c:162:llog_receptor_accept()) Skipped 1 previous similar message
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This does not appear to be causing SWL tests to fail at this time. &lt;/p&gt;</description>
                <environment>Lustre 2.2.93, LLNL/Hyperion</environment>
        <key id="15617">LU-1797</key>
            <summary>MDS deactivates/reactivates OSTs</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="green">Oleg Drokin</assignee>
                                    <reporter username="cliffw">Cliff White</reporter>
                        <labels>
                    </labels>
                <created>Tue, 28 Aug 2012 14:11:25 +0000</created>
                <updated>Thu, 9 Jan 2020 06:30:15 +0000</updated>
                            <resolved>Thu, 9 Jan 2020 06:30:15 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="43890" author="cliffw" created="Tue, 28 Aug 2012 14:26:48 +0000"  >&lt;p&gt;The OSS node does not report errors, nor does it inactivate connections. There are no client-side errors reported.&lt;/p&gt;</comment>
                            <comment id="43891" author="pjones" created="Tue, 28 Aug 2012 14:32:38 +0000"  >&lt;p&gt;Oleg&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="43895" author="green" created="Tue, 28 Aug 2012 15:59:45 +0000"  >&lt;p&gt;Hm, it almost looks like we are going through del orphan, all is fine, we progress further into precreate and get disconnected there for some reason again, and then the check for osc in recovery fails and we bail out with EIO.&lt;br/&gt;
It would be useful to have lctl dk from MDT when this happens (+ha enabled in debug level).&lt;/p&gt;

&lt;p&gt;The other separate question is how come OST did not reply to the create request? Even if OST is busy on the backend, at least early replies should still be flowing. Would be cool to get a matching lctl dk from ost as well, but I guess under this kind of load the buffer rotates quite quickly and it would be useless.&lt;/p&gt;</comment>
                            <comment id="43896" author="cliffw" created="Tue, 28 Aug 2012 16:42:27 +0000"  >&lt;p&gt;Attached is lctl dk from a failure (lustre-OST0032)&lt;/p&gt;</comment>
                            <comment id="43909" author="cliffw" created="Tue, 28 Aug 2012 18:51:42 +0000"  >&lt;p&gt;I have uploaded lu-1797-1538.txt.gz to FTP. That should be an lctl dk with all the calls included.&lt;br/&gt;
Associated syslog errors:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: 3942:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request  sent has timed out &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; slow reply: [sent 1346193234/real 1346193234]  req@ffff880014297400 x1411378058874282/t0(0) o5-&amp;gt;lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1346193340 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1
Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection to lustre-OST000d (at 192.168.127.65@o2ib1) was lost; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; recovery to complete
Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection restored to lustre-OST000d (at 192.168.127.65@o2ib1)
Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: Skipped 1 previous similar message
Aug 28 15:35:40 ehyperion-rst6 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST000d_UUID now active, resetting orphans
Aug 28 15:35:51 ehyperion-rst6 kernel: LustreError: 3942:0:(osc_create.c:169:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc  req@ffff880014297400 x1411378058874282/t0(0) o5-&amp;gt;lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1346193340 ref 1 fl Interpret:RXN/0/ffffffff rc -107/-1
Aug 28 15:36:30 ehyperion-rst6 kernel: LustreError: 15560:0:(lov_obd.c:1063:lov_clear_orphans()) error in orphan recovery on OST idx 13/60: rc = -5
Aug 28 15:36:30 ehyperion-rst6 kernel: LustreError: 15560:0:(mds_lov.c:883:__mds_lov_synchronize()) lustre-OST000d_UUID failed at mds_lov_clear_orphans: -5
Aug 28 15:36:31 ehyperion-rst6 kernel: LustreError: 15560:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST000d_UUID sync failed -5, deactivating
Aug 28 15:38:20 ehyperion-rst6 kernel: cannot allocate a tage (1)
Aug 28 15:38:20 ehyperion-rst6 kernel: cannot allocate a tage (4)
Aug 28 15:39:06 ehyperion-rst6 kernel: Lustre: 3942:0:(client.c:1920:ptlrpc_expire_one_request()) @@@ Request  sent has timed out &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; slow reply: [sent 1346193440/real 1346193440]  req@ffff8801f8cca800 x1411378059090858/t0(0) o400-&amp;gt;lustre-OST000d-osc-MDT0000@192.168.127.65@o2ib1:28/4 lens 224/224 e 0 to 1 dl 1346193546 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1
Aug 28 15:39:06 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection to lustre-OST000d (at 192.168.127.65@o2ib1) was lost; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; recovery to complete
Aug 28 15:39:06 ehyperion-rst6 kernel: Lustre: lustre-OST000d-osc-MDT0000: Connection restored to lustre-OST000d (at 192.168.127.65@o2ib1)
Aug 28 15:39:06 ehyperion-rst6 kernel: Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST000d_UUID now active, resetting orphans

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="43912" author="green" created="Tue, 28 Aug 2012 22:10:23 +0000"  >&lt;p&gt;Latest logs shed some light onto what&apos;s going on.&lt;/p&gt;

&lt;p&gt;OST becomes very overloaded and creates start to take a lot of time (due to other disk activity I guess).&lt;br/&gt;
MDT sends a precreate request that eventually times out&lt;br/&gt;
MDT reconnects back (which goes pretty fast as long as we don&apos;t need to write anything to disk in a synchronous manner, or read anything that&apos;s not cached)&lt;br/&gt;
After reconnect MDT tries to populate MDT precreate pool and send a precreate request agan, there&apos;s a hard timeout for this activity at 1/2 obd_timeout (not converted to AT).&lt;br/&gt;
Once the timeout passes, mdt_lov portion of the code gets and error and decided that OST is in too bad shape to use and disables it (which might not be all that bad idea if it&apos;s too loaded).&lt;/p&gt;

&lt;p&gt;Later one in the original set of logs for case #1 we see that a ping times out for some reason (network packet loss or something) which triggers reconnect and clears the inactive flag.&lt;/p&gt;

&lt;p&gt;So in the end I imagine for the &quot;Fix&quot; we need to actually schedule a later reconnect (or even precreate) atttempt for such busy OSTs later on and once the load eases, enable them back. Some of the messages would need to be silenced as well.&lt;br/&gt;
This code is redone in 2.4 so needs to be inspected separately. Also Create on write would fix this by eliminating precreates.&lt;/p&gt;
</comment>
                            <comment id="43987" author="pjones" created="Thu, 30 Aug 2012 10:46:15 +0000"  >&lt;p&gt;Dropping priority as this appears to be a long-existing issue &lt;/p&gt;</comment>
                            <comment id="44024" author="cliffw" created="Thu, 30 Aug 2012 23:29:07 +0000"  >&lt;p&gt;Could not replicate with 1-&amp;gt;3 OSS, 2-&amp;gt;24 OST. Returned to normal testing, inbetween parallel-scale runs had one OST drop. Obtained lctl dk from OST, but suspect it was too old.&lt;br/&gt;
However, this time observed bothe client and OSS errors:&lt;br/&gt;
Client:&lt;/p&gt;

&lt;p&gt;LustreError: 26333:0:(mdc_request.c:1429:mdc_quotactl()) ptlrpc_queue_wait failed, rc: -114&lt;br/&gt;
OSS&lt;/p&gt;

&lt;p&gt;192.168.127.63: Lustre: DEBUG MARKER: == parallel-scale test iorfpp: iorfpp == 18:30:49 (1346376649)&lt;br/&gt;
192.168.127.63: LustreError: 9253:0:(ldlm_resource.c:1101:ldlm_resource_get()) lvbo_init failed for resource 165782: rc -2&lt;br/&gt;
192.168.127.63: LustreError: 9175:0:(ldlm_resource.c:1101:ldlm_resource_get()) lvbo_init failed for resource 165740: rc -2&lt;br/&gt;
192.168.127.63: LustreError: 9175:0:(ldlm_resource.c:1101:ldlm_resource_get()) Skipped 33 previous similar messages&lt;br/&gt;
192.168.127.63: LustreError: 9253:0:(ldlm_resource.c:1101:ldlm_resource_get()) Skipped 5 previous similar messages&lt;br/&gt;
192.168.127.63: Lustre: DEBUG MARKER: == parallel-scale parallel-scale.sh test complete, duration 6104 sec == 18:59:05 (1346378345)&lt;br/&gt;
192.168.127.63: Lustre: lustre-OST0033: Client lustre-MDT0000-mdtlov_UUID (at 192.168.127.6@o2ib1) reconnecting&lt;br/&gt;
192.168.127.63: Lustre: lustre-OST0033: received MDS connection from 192.168.127.6@o2ib1&lt;br/&gt;
192.168.127.63: Lustre: Skipped 1 previous similar message&lt;br/&gt;
192.168.127.63: Lustre: 9424:0:(llog_net.c:162:llog_receptor_accept()) changing the import ffff8802cc298800 - ffff8801c4c4a000&lt;br/&gt;
192.168.127.63: Lustre: 9424:0:(llog_net.c:162:llog_receptor_accept()) changing the import ffff8802cc298800 - ffff8801c4c4a000&lt;/p&gt;</comment>
                            <comment id="45783" author="cliffw" created="Sun, 30 Sep 2012 17:33:40 +0000"  >&lt;p&gt;We are hitting this bug again with the lustre-review 9573 build&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;hyperion-rst6 login: Lustre: 4171:0:(client.c:1917:ptlrpc_expire_one_request()) @@@ Request  sent has timed out &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; slow reply: [sent 1349040375/real 1349040375]  req@ffff88020629a000 x1414547145605102/t0(0) o5-&amp;gt;lustre-OST0034-osc-MDT0000@192.168.127.64@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1349040481 ref 1 fl Rpc:RXN/0/ffffffff rc 0/-1
Lustre: 4171:0:(client.c:1917:ptlrpc_expire_one_request()) Skipped 85 previous similar messages
Lustre: lustre-OST0034-osc-MDT0000: Connection to lustre-OST0034 (at 192.168.127.64@o2ib1) was lost; in progress operations using &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; service will wait &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; recovery to complete
Lustre: Skipped 6 previous similar messages
Lustre: lustre-OST0034-osc-MDT0000: Connection restored to lustre-OST0034 (at 192.168.127.64@o2ib1)
Lustre: Skipped 1 previous similar message
Lustre: MDS mdd_obd-lustre-MDT0000: lustre-OST0034_UUID now active, resetting orphans
Lustre: Skipped 19 previous similar messages
LustreError: 4171:0:(osc_create.c:169:osc_interpret_create()) @@@ Unknown rc -107 from async create: failing oscc  req@ffff88020629a000 x1414547145605102/t0(0) o5-&amp;gt;lustre-OST0034-osc-MDT0000@192.168.127.64@o2ib1:7/4 lens 432/432 e 0 to 1 dl 1349040481 ref 1 fl Interpret:RXN/0/ffffffff rc -107/-1
LustreError: 7487:0:(lov_obd.c:1063:lov_clear_orphans()) error in orphan recovery on OST idx 52/60: rc = -5
LustreError: 7487:0:(mds_lov.c:883:__mds_lov_synchronize()) lustre-OST0034_UUID failed at mds_lov_clear_orphans: -5
LustreError: 7487:0:(mds_lov.c:903:__mds_lov_synchronize()) lustre-OST0034_UUID sync failed -5, deactivating
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="51521" author="louveta" created="Thu, 31 Jan 2013 06:17:18 +0000"  >&lt;p&gt;One of our system is showing this behaviour with a high reproduction rate : about 1 disconnection every 5 minutes for the past 2 days and I fail to find any evidence of high load on OSSes or MDS.&lt;/p&gt;

&lt;p&gt;The filesystem is made of 540 OSTs equally distributed across 36 OSSes. OSTs disconnected are not particularly located. Interconnect between nodes &amp;amp; servers doesn&apos;t report problems and there isn&apos;t any messages in the log than the one reported in the LU.&lt;/p&gt;

&lt;p&gt;The schema is always the same. One request did expire for slow reply after obd_timeout / 2 (ie 200s for me), then a reconnect fail during orphan recovery. If I try to force a recover it works like a charm, even if the forced recover is scheduled immediately after the deactivation. It is surprising as I can admit that something went wrong at a time, then that obd_timeout / 2 secondes later it was still wrong and failed the reconnection, but I failed to convince myself that I&apos;m luky enough for the problem to disappear 1 second later when I schedule a forced recover. Am i wrong ?&lt;/p&gt;

&lt;p&gt;Site is running lustre 2.1.3 + patches. We have just upgraded the patches list last week to add &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1650&quot; title=&quot;crash of lustre clients in osc_req_attr_set() routine&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1650&quot;&gt;&lt;del&gt;LU-1650&lt;/del&gt;&lt;/a&gt; + &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1039&quot; title=&quot;data corruption in check_set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1039&quot;&gt;&lt;del&gt;LU-1039&lt;/del&gt;&lt;/a&gt; + &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2170&quot; title=&quot;osc_extent_merge()) ASSERTION( cur-&amp;gt;oe_osclock == victim-&amp;gt;oe_osclock) while running racer&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2170&quot;&gt;&lt;del&gt;LU-2170&lt;/del&gt;&lt;/a&gt;, but we have another cluster running with the same software which doesn&apos;t have this problem.&lt;br/&gt;
Looking backward, I have also found a trace of the same problem 3 months ago, so I tend to think that those patches are not responsible of this problem. At this time, it seems that the cluster was rebooted and the problem disappeared.&lt;/p&gt;

&lt;p&gt;Site is classified so exporting data is not that easy. What can I check to try to make progress ?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;/p&gt;</comment>
                            <comment id="52034" author="louveta" created="Fri, 8 Feb 2013 09:51:17 +0000"  >&lt;p&gt;While I wasn&apos;t able to find any evidence of OST being overloaded, and after having lost connection to over 400 OSTs, we did reboot everything. Now everything is back to normal.&lt;br/&gt;
The only particularity was that we made a filesystem extension just before moving to production. Don&apos;t know if it makes any difference, just a fact.&lt;/p&gt;</comment>
                            <comment id="52840" author="green" created="Thu, 21 Feb 2013 18:02:29 +0000"  >&lt;p&gt;Well, I suspect extending the filesystem hints at perhaps you having a lot of mostly full (node-wise) filesystem and in order to satisfy precreates a lot of inode groups would need to be traversed which took a lot of time as they are not cached.&lt;br/&gt;
Now immediate reconnect succeeds because the same traversal is now faster as all those blocks are already read and in cache (but would be thrown away later under load again after some time).&lt;/p&gt;

&lt;p&gt;Extension of the FS added some very empty inode groups where a single block read allows to satisfy entire precreate request which happens a lot faster.&lt;/p&gt;

&lt;p&gt;You can probably confirm this by looking at debugfs output and I suspect observe inode groups to be very sparsely populated with free inodes right to about the point of pre-extension.&lt;br/&gt;
df -i pre-extension would probably shown those OSTs as very low on free inodes too?&lt;/p&gt;</comment>
                            <comment id="260836" author="adilger" created="Thu, 9 Jan 2020 06:30:15 +0000"  >&lt;p&gt;Close old bug&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11806" name="failure.txt.gz" size="1487171" author="cliffw" created="Tue, 28 Aug 2012 16:41:58 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvsmv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8543</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>