<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:23:29 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2232] LustreError: 9120:0:(ost_handler.c:1673:ost_prolong_lock_one()) LBUG</title>
                <link>https://jira.whamcloud.com/browse/LU-2232</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Last night we had two OSSs panic at virtually the same time with and LBUG error being thrown. We just updated our servers and clients to 2.1.2-4chaos from 2.1.2-3chaos releases with the past 2 days and had not experienced this issue with the previous release. Below is a sample of the console log from one of the servers. I have also captured all the console messages up until the system panicked and am attaching it. &lt;/p&gt;

&lt;p&gt;LustreError: 9044:0:(ost_handler.c:1673:ost_prolong_lock_one()) ASSERTION(lock-&amp;gt;l_export == opd-&amp;gt;opd_exp) failed&lt;br/&gt;
LustreError: 9120:0:(ost_handler.c:1673:ost_prolong_lock_one()) ASSERTION(lock-&amp;gt;l_export == opd-&amp;gt;opd_exp) failed&lt;br/&gt;
LustreError: 9120:0:(ost_handler.c:1673:ost_prolong_lock_one()) LBUG&lt;br/&gt;
Pid: 9120, comm: ll_ost_io_341&lt;/p&gt;

&lt;p&gt;Call Trace:&lt;br/&gt;
LustreError: 9083:0:(ost_handler.c:1673:ost_prolong_lock_one()) ASSERTION(lock-&amp;gt;l_export == opd-&amp;gt;opd_exp) failed&lt;br/&gt;
LustreError: 9083:0:(ost_handler.c:1673:ost_prolong_lock_one()) LBUG&lt;br/&gt;
 &lt;span class=&quot;error&quot;&gt;&amp;#91;&amp;lt;ffffffffa0440895&amp;gt;&amp;#93;&lt;/span&gt; libcfs_debug_dumpstack+0x55/0x80 &lt;span class=&quot;error&quot;&gt;&amp;#91;libcfs&amp;#93;&lt;/span&gt;&lt;br/&gt;
Pid: 9083, comm: ll_ost_io_304&lt;/p&gt;</description>
                <environment>Dell R710 servers running TOSS-2.0-2 and DDN 10k storage.</environment>
        <key id="16430">LU-2232</key>
            <summary>LustreError: 9120:0:(ost_handler.c:1673:ost_prolong_lock_one()) LBUG</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="jamervi">Joe Mervini</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Thu, 25 Oct 2012 12:24:33 +0000</created>
                <updated>Mon, 24 Aug 2015 21:57:34 +0000</updated>
                            <resolved>Mon, 24 Aug 2015 21:57:34 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                    <version>Lustre 2.1.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>18</watches>
                                                                            <comments>
                            <comment id="46919" author="pjones" created="Thu, 25 Oct 2012 14:49:01 +0000"  >&lt;p&gt;Oleg is looking into this one&lt;/p&gt;</comment>
                            <comment id="46922" author="green" created="Thu, 25 Oct 2012 14:50:33 +0000"  >&lt;p&gt;Do you have a crash dump? Can you print the lock content by any chance?&lt;/p&gt;</comment>
                            <comment id="46925" author="jamervi" created="Thu, 25 Oct 2012 16:07:41 +0000"  >&lt;p&gt;Unfortunately, no. Our servers are diskless so when it dumps core it is lost on reboot. I have linked /var/crash to a NFS mounted device so hopefully if the LBUG occurs again it will write to that device before freezing. &lt;/p&gt;

&lt;p&gt;As far as printing the lock content, I&apos;m not sure what you mean. If you can tell me how to gather that information I will be happy to if I can.&lt;/p&gt;</comment>
                            <comment id="46953" author="green" created="Fri, 26 Oct 2012 08:41:05 +0000"  >&lt;p&gt;Disklessness does not matter.&lt;br/&gt;
You just need to configure kdump to dump kernel core to an nfs share (no need to mount it at runtime, the debug kernel will do it separately).&lt;br/&gt;
The lock content you will get out of the crashdump once you have it.&lt;br/&gt;
How repeatable is this?&lt;/p&gt;</comment>
                            <comment id="47606" author="jamervi" created="Thu, 8 Nov 2012 18:43:08 +0000"  >&lt;p&gt;We should probably close this issue. We decided to roll back to the previous version during a system outage last week to avoid running into the issue again in the short term. We have another system that is preproduction that is running the current version so we will monitor that and report back if we encounter the problem again. &lt;/p&gt;</comment>
                            <comment id="47615" author="pjones" created="Thu, 8 Nov 2012 19:49:07 +0000"  >&lt;p&gt;ok thanks Joe!&lt;/p&gt;</comment>
                            <comment id="68743" author="shadow" created="Thu, 10 Oct 2013 15:06:36 +0000"  >&lt;p&gt;Xyratex hit that bug at own branch..&lt;/p&gt;</comment>
                            <comment id="76129" author="paf" created="Mon, 3 Feb 2014 20:57:58 +0000"  >&lt;p&gt;Cray just hit this in our 2.5 with SLES11SP3 clients on CentOS 6.4 servers, twice.&lt;/p&gt;

&lt;p&gt;Both times, it was immediately preceded by a &apos;double eviction&apos; of a particular client, two callback timers expiring within the same second for the same client:&lt;/p&gt;

&lt;p&gt;2014-02-01 13:44:35 LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 117s: evicting client at 54@gni1  ns: filter-esfprod-OST0004_UUID lock: ffff8803fa547980/0xd460be1680c1b564 lrc: 3/0,0 mode: PW/PW res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x424b478:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc: 2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;4095) flags: 0x60000000010020 nid: 54@gni1 remote: 0xf4283d6d3305e0e5 expref: 2679 pid: 22955 timeout: 4994587934 lvb_type: 0&lt;br/&gt;
2014-02-01 13:44:35 LustreError: 0:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 117s: evicting client at 54@gni1  ns: filter-esfprod-OST0004_UUID lock: ffff8802c669d100/0xd460be1680c1b580 lrc: 3/0,0 mode: PW/PW res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0x424b479:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc: 2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;4095) flags: 0x60000000010020 nid: 54@gni1 remote: 0xf4283d6d3305e155 expref: 2679 pid: 22955 timeout: 4994587934 lvb_type: 0&lt;/p&gt;

&lt;p&gt;And the other instance of it (on a different system, also running Cray&apos;s 2.5):&lt;br/&gt;
LustreError: 3489:0:(ldlm_lockd.c:433:ldlm_add_waiting_lock()) Skipped 1 previous similar message&lt;br/&gt;
LustreError: 53:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 113s: evicting client at 62@gni&lt;br/&gt;
  ns: filter-scratch-OST0001_UUID lock: ffff880445551180/0xa9b58315ab2d4d0a lrc: 3/0,0 mode: PW/PW res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0xb5da7e1:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc:&lt;br/&gt;
2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;18446744073709551615) flags: 0x60000000010020 nid: 62@gni remote: 0x330a8b1d9fd47938&lt;br/&gt;
expref: 1024 pid: 16057 timeout: 4781830324 lvb_type: 0&lt;/p&gt;

&lt;p&gt;LustreError: 53:0:(ldlm_lockd.c:344:waiting_locks_callback()) ### lock callback timer expired after 113s: evicting client at 62@gni&lt;br/&gt;
  ns: filter-scratch-OST0001_UUID lock: ffff880387977bc0/0xa9b58315ab2d4d6c lrc: 3/0,0 mode: PW/PW res: &lt;span class=&quot;error&quot;&gt;&amp;#91;0xb5da7e6:0x0:0x0&amp;#93;&lt;/span&gt;.0 rrc:&lt;br/&gt;
2 type: EXT &lt;span class=&quot;error&quot;&gt;&amp;#91;0-&amp;gt;18446744073709551615&amp;#93;&lt;/span&gt; (req 0-&amp;gt;18446744073709551615) flags: 0x60000000010020 nid: 62@gni remote: 0x330a8b1d9fd47bfb&lt;br/&gt;
expref: 1030 pid: 16082 timeout: 4781830324 lvb_type: 0&lt;/p&gt;

&lt;p&gt;I&apos;m just speculating, but I suspect this is key.  &lt;/p&gt;

&lt;p&gt;For some reason, the nid_stats struct pointer in the obd_export is zero, so I wasn&apos;t able to confirm if this export was actually the double-evicted client.  (Perhaps this is always the case?)&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;Edited to clarify server and client versions.  Also, we&amp;#39;ve seen this several times since.  Not sure why we&amp;#39;re suddenly seeing it now, as indications are the underlying bug has been in the code for some time.&amp;#93;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="81153" author="paf" created="Tue, 8 Apr 2014 03:22:34 +0000"  >&lt;p&gt;Lai, Jodi,&lt;/p&gt;

&lt;p&gt;I&apos;m happy to see this bug re-opened, but I&apos;m curious why?  Was it seen again at Intel or a customer site?  Was it re-opened because of the Cray report?&lt;/p&gt;</comment>
                            <comment id="81171" author="laisiyao" created="Tue, 8 Apr 2014 14:32:18 +0000"  >&lt;p&gt;Patrick, this is seen again in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4844&quot; title=&quot;ost_prolong_lock_one()) ASSERTION( lock-&amp;gt;l_export == opd-&amp;gt;opd_exp )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4844&quot;&gt;&lt;del&gt;LU-4844&lt;/del&gt;&lt;/a&gt;, and it&apos;s better to re-open this one and mark &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4844&quot; title=&quot;ost_prolong_lock_one()) ASSERTION( lock-&amp;gt;l_export == opd-&amp;gt;opd_exp )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4844&quot;&gt;&lt;del&gt;LU-4844&lt;/del&gt;&lt;/a&gt; as duplicate.&lt;/p&gt;</comment>
                            <comment id="81181" author="paf" created="Tue, 8 Apr 2014 15:46:27 +0000"  >&lt;p&gt;Thank you, Lai.&lt;/p&gt;

&lt;p&gt;Cray can provide a dump of an OST with this crash, if you think that would be helpful.  &lt;/p&gt;

&lt;p&gt;We&apos;re currently trying to replicate it with D_RPCTRACE enabled, so if we get a dump with that, we can send that over as well.  (We&apos;re working with Xyratex on this bug.)&lt;/p&gt;</comment>
                            <comment id="81211" author="paf" created="Tue, 8 Apr 2014 18:55:54 +0000"  >&lt;p&gt;We replicated this on an OSS with +rpctrace enabled.  I&apos;ve provided the dump to Xyratex, and it will be uploaded here in about 5 minutes:&lt;br/&gt;
ftp.whamcloud.com&lt;br/&gt;
uploads/LELUS-2232/LELUS-234_&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2232&quot; title=&quot;LustreError: 9120:0:(ost_handler.c:1673:ost_prolong_lock_one()) LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2232&quot;&gt;&lt;del&gt;LU-2232&lt;/del&gt;&lt;/a&gt;.1404081304.tar.gz&lt;/p&gt;</comment>
                            <comment id="81273" author="laisiyao" created="Wed, 9 Apr 2014 14:12:09 +0000"  >&lt;p&gt;The backtrace shows it LBUG on the first ost_prolong_lock_one() in ost_prolong_locks(), IMO what happened is like this:&lt;br/&gt;
1. client did IO with lock handle in the request.&lt;br/&gt;
2. IO bulk failed on server, so no reply to client to let it resend.&lt;br/&gt;
3. lock cancelling timed out on server, the client was evicted.&lt;br/&gt;
4. client reconnected and resent previous IO request, however the lock handle is obsolete, so the LASSERT was triggered. (this lock should have been replayed, but this request was simply resent, there is no way to update lock handle in the resent request)&lt;/p&gt;

&lt;p&gt;I&apos;ll provide a patch to check lock-&amp;gt;l_export against opd-&amp;gt;opd_exp other than assert for the first ost_prolong_lock_one().&lt;/p&gt;
</comment>
                            <comment id="81374" author="laisiyao" created="Thu, 10 Apr 2014 14:49:39 +0000"  >&lt;p&gt;Patches are ready:&lt;br/&gt;
2.4: &lt;a href=&quot;http://review.whamcloud.com/#/c/9925/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9925/&lt;/a&gt;&lt;br/&gt;
2.5: &lt;a href=&quot;http://review.whamcloud.com/#/c/9926/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9926/&lt;/a&gt;&lt;br/&gt;
master: &lt;a href=&quot;http://review.whamcloud.com/#/c/9927/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9927/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="81762" author="green" created="Wed, 16 Apr 2014 19:29:06 +0000"  >&lt;p&gt;Question - if the client was evicted - the request resending should have failed because the server would reject it.&lt;br/&gt;
So there&apos;s something else happening that your explanation does not quite explain I think.&lt;/p&gt;</comment>
                            <comment id="81916" author="laisiyao" created="Fri, 18 Apr 2014 04:45:22 +0000"  >&lt;p&gt;I&apos;m afraid some requests were not aborted upon eviction on client, so that they were resent through new connection. I&apos;ll make some test to find more proof.&lt;/p&gt;</comment>
                            <comment id="83132" author="laisiyao" created="Sun, 4 May 2014 07:45:05 +0000"  >&lt;p&gt;I did some test but couldn&apos;t reproduce this failure, so I updated the patch for 2.4/2.5 to debug patch which will print lock and request export address and crash as before, so that we can dump these two exports to help analyse.&lt;/p&gt;

&lt;p&gt;Could you apply the patch and reproduce it? and if it crashes upload the crash dump.&lt;/p&gt;</comment>
                            <comment id="85516" author="green" created="Mon, 2 Jun 2014 21:05:08 +0000"  >&lt;p&gt;I think &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5116&quot; title=&quot;Race between resend and reply processing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5116&quot;&gt;&lt;del&gt;LU-5116&lt;/del&gt;&lt;/a&gt; might be related here to explan resend across eviction&lt;/p&gt;</comment>
                            <comment id="85517" author="pjones" created="Mon, 2 Jun 2014 21:13:33 +0000"  >&lt;p&gt;It would be a sensible approach to try both the fix from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5116&quot; title=&quot;Race between resend and reply processing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5116&quot;&gt;&lt;del&gt;LU-5116&lt;/del&gt;&lt;/a&gt; and the diagnostic patch from &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2232&quot; title=&quot;LustreError: 9120:0:(ost_handler.c:1673:ost_prolong_lock_one()) LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2232&quot;&gt;&lt;del&gt;LU-2232&lt;/del&gt;&lt;/a&gt;. That way perhaps the issue would be fixed by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5116&quot; title=&quot;Race between resend and reply processing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5116&quot;&gt;&lt;del&gt;LU-5116&lt;/del&gt;&lt;/a&gt; but if there is still a residual problem we will have fuller information to go forward on.&lt;/p&gt;</comment>
                            <comment id="85521" author="paf" created="Mon, 2 Jun 2014 21:49:13 +0000"  >&lt;p&gt;Lai - Sorry we didn&apos;t update this earlier.  Cray tried to reproduce this bug with your debug patch, and were unable to do so.  After that, we pulled your patch set 1 in to our Lustre version and haven&apos;t seen this bug since.&lt;/p&gt;</comment>
                            <comment id="86209" author="pjones" created="Tue, 10 Jun 2014 13:09:52 +0000"  >&lt;p&gt;Patrick&lt;/p&gt;

&lt;p&gt;To be clear, Cray could reliably reproduce this issue, then applied the diagnostic patch and could not? What code line was this on?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="86216" author="paf" created="Tue, 10 Jun 2014 14:57:54 +0000"  >&lt;p&gt;Peter,&lt;/p&gt;

&lt;p&gt;That&apos;s correct.  It was on b2_5.&lt;/p&gt;

&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;Patrick&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="86217" author="pjones" created="Tue, 10 Jun 2014 15:26:42 +0000"  >&lt;p&gt;thanks Patrick. Wow. That is odd.&lt;/p&gt;</comment>
                            <comment id="92339" author="paf" created="Mon, 25 Aug 2014 19:36:54 +0000"  >&lt;p&gt;Lai - Given that the code in this area was changed a fair bit in 2.6, do you think this bug is likely to still be present?&lt;/p&gt;</comment>
                            <comment id="92379" author="laisiyao" created="Tue, 26 Aug 2014 01:52:11 +0000"  >&lt;p&gt;Per Oleg&apos;s comment, this may be a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5116&quot; title=&quot;Race between resend and reply processing&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5116&quot;&gt;&lt;del&gt;LU-5116&lt;/del&gt;&lt;/a&gt;, I&apos;ve tried to verify it by adding some code and test, but not have enough time to finish it. I will keep an eye on it, and at the moment we can mark it resolved.&lt;/p&gt;</comment>
                            <comment id="92490" author="paf" created="Tue, 26 Aug 2014 18:30:41 +0000"  >&lt;p&gt;Lai - Thanks for the answer.  We&apos;ll go with that and I&apos;ll let you know if Cray sees any issues.  (We&apos;re just moving in to internal deployment of 2.6.)&lt;/p&gt;</comment>
                            <comment id="115112" author="morrone" created="Tue, 12 May 2015 22:51:07 +0000"  >&lt;p&gt;The debug message hit on an MDS.  Here it is:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2015-04-24 09:00:42 Lustre: lsd-OST0000: haven&apos;t heard from client 6048f9e4-4557-a727-5b7d-dd9f7cb9c8b1 (at 192.168.116.115@o2ib5) in 227 seconds. I think it&apos;s dead, and I am evicting it. exp ffff8805cbcd5000, cur 1429891242 expire 1429891092 last 1429891015
2015-04-24 09:14:05 LustreError: 0:0:(ldlm_lockd.c:355:waiting_locks_callback()) ### lock callback timer expired after 100s: evicting client at 172.19.1.85@o2ib100  ns: filter-lsd-OST0000_UUID lock: ffff880bf976a880/0x7e1b5086f6ebf35e lrc: 3/0,0 mode: PR/PR res: [0x2678a59:0x0:0x0].0 rrc: 266 type: EXT [0-&amp;gt;18446744073709551615] (req 468705280-&amp;gt;468717567) flags: 0x60000000000020 nid: 172.19.1.85@o2ib100 remote: 0xa9a53ce40d684d49 expref: 54562 pid: 6158 timeout: 5072055798 lvb_type: 0
2015-04-24 09:14:05 LustreError: 22433:0:(ldlm_lockd.c:2342:ldlm_cancel_handler()) ldlm_cancel from 172.19.1.85@o2ib100 arrived at 1429892045 with bad export cookie 9086945213692644542
2015-04-24 09:14:05 LustreError: 6684:0:(ldlm_lock.c:2448:ldlm_lock_dump_handle()) ### ### ns: filter-lsd-OST0000_UUID lock: ffff880d7abe3680/0x7e1b5086f3c622c3 lrc: 3/0,0 mode: PR/PR res: [0x24bf940:0x0:0x0].0 rrc: 3 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;18446744073709551615) flags: 0x40000000000000 nid: 172.19.1.85@o2ib100 remote: 0xa9a53ce40617fec7 expref: 54547 pid: 6183 timeout: 0 lvb_type: 1
2015-04-24 09:14:05 LustreError: 22433:0:(ldlm_lockd.c:2342:ldlm_cancel_handler()) Skipped 1 previous similar message
2015-04-24 09:18:04 LustreError: 6784:0:(ost_handler.c:1956:ost_prolong_locks()) ### LU-2232: lock export ffff880e2d590400 != req export ffff880d8a2eb400, this lock is obsolete!
2015-04-24 09:18:04  ns: filter-lsd-OST0000_UUID lock: ffff880bf976a880/0x7e1b5086f6ebf35e lrc: 4/0,0 mode: PR/PR res: [0x2678a59:0x0:0x0].0 rrc: 135 type: EXT [0-&amp;gt;18446744073709551615] (req 468705280-&amp;gt;468717567) flags: 0x60000000000020 nid: 172.19.1.85@o2ib100 remote: 0xa9a53ce40d684d49 expref: 35636 pid: 6158 timeout: 5072055798 lvb_type: 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;No assertion, though.  Lustre 2.5.3-5chaos.&lt;/p&gt;</comment>
                            <comment id="116300" author="laisiyao" created="Mon, 25 May 2015 02:26:13 +0000"  >&lt;p&gt;The log shows after client eviction (and quite possibly reconnected), a stale lock handle is packed in client rw request.&lt;/p&gt;

&lt;p&gt;I added some code to try to simulate this, but failed. According to the code, after eviction, client inflight RPCs will be aborted, and locks be cleaned up. If a full debug log covering this recovery(at least on client) can be obtained upon this error message is seen, it can help move this forward.&lt;/p&gt;</comment>
                            <comment id="116482" author="gerrit" created="Wed, 27 May 2015 07:45:28 +0000"  >&lt;p&gt;Lai Siyao (lai.siyao@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/14950&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14950&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2232&quot; title=&quot;LustreError: 9120:0:(ost_handler.c:1673:ost_prolong_lock_one()) LBUG&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2232&quot;&gt;&lt;del&gt;LU-2232&lt;/del&gt;&lt;/a&gt; debug: print debug for prolonged lock&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_5&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1d21596de85b81bb3f98277f8f0b425368ccb187&lt;/p&gt;</comment>
                            <comment id="116487" author="laisiyao" created="Wed, 27 May 2015 08:19:40 +0000"  >&lt;p&gt;The updated debug patch will print both lock and request details, which can tell us whether this request is new, or resent/replay request.&lt;/p&gt;</comment>
                            <comment id="119166" author="morrone" created="Sat, 20 Jun 2015 00:09:55 +0000"  >&lt;p&gt;I have pulled change 14950, Patch Set 1, into LLNL&apos;s local tree.  It is in the queue to go into the next TOSS release and eventually roll out into production.&lt;/p&gt;</comment>
                            <comment id="124960" author="marc@llnl.gov" created="Mon, 24 Aug 2015 21:35:14 +0000"  >&lt;p&gt;We have not seen any more occurrences of this error since we rolled out our 2.5.4-4chaos version into production.&lt;/p&gt;</comment>
                            <comment id="124963" author="pjones" created="Mon, 24 Aug 2015 21:57:34 +0000"  >&lt;p&gt;Thanks Marc!&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="23994">LU-4844</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="24845">LU-5116</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="11978" name="LBUG" size="15465" author="jamervi" created="Thu, 25 Oct 2012 12:24:33 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvatz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5290</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10023"><![CDATA[4]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>