<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:18:00 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-8488] LBUG in ptl_send_rpc(), request-&gt;rq_xid &lt;= imp-&gt;imp_known_replied_xid fails</title>
                <link>https://jira.whamcloud.com/browse/LU-8488</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We have had three clients crash pretty quickly when testing Lustre 2.8 clients with lustre 2.5.5 servers.  The clients were using tag 2.8.0_0.0.llnlpreview.18, the servers are using 2.5.5-8chaos (see the lustre-release-fe-llnl repo for each).&lt;/p&gt;

&lt;p&gt;They hit the following LBUG:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;2016-08-09 08:24:53 [569127.746044] Lustre: lcy-OST0004-osc-ffff882025188000: Connection restored to 10.1.1.175@o2ib9 (at 10.1.1.175@o2ib9)
2016-08-09 08:24:53 [569127.760391] Lustre: Skipped 7 previous similar messages
2016-08-09 08:27:04 [569258.797932] LustreError: 166-1: MGC10.1.1.169@o2ib9: Connection to MGS (at 10.1.1.169@o2ib9) was lost; in progress operations using this service will fail
2016-08-09 08:28:14 [569328.893397] Lustre: lcy-OST0004-osc-ffff882025188000: Connection to lcy-OST0004 (at 10.1.1.175@o2ib9) was lost; in progress operations using this service will wait for rec
overy to complete
2016-08-09 08:28:14 [569328.918048] Lustre: lcy-OST0004-osc-ffff882025188000: Connection restored to 10.1.1.175@o2ib9 (at 10.1.1.175@o2ib9)
2016-08-09 08:28:14 [569328.932430] Lustre: Skipped 1 previous similar message
2016-08-09 08:36:14 [569809.168166] Lustre: 130908:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1470756919/real 1470756919]  req@ffff880d11ca
e600 x1541675108398336/t0(0) o8-&amp;gt;lquake-OST0003-osc-ffff882022b7f800@172.19.1.130@o2ib100:28/4 lens 520/544 e 0 to 1 dl 1470756974 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
2016-08-09 08:36:14 [569809.209141] Lustre: 130908:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 9 previous similar messages
2016-08-09 08:36:40 [569835.180706] Lustre: lcy-OST0008-osc-ffff882025188000: Connection to lcy-OST0008 (at 10.1.1.179@o2ib9) was lost; in progress operations using this service will wait for rec
overy to complete
2016-08-09 08:36:40 [569835.206029] Lustre: lcy-OST0008-osc-ffff882025188000: Connection restored to 10.1.1.179@o2ib9 (at 10.1.1.179@o2ib9)
2016-08-09 08:46:15 [570411.139216] Lustre: 130984:0:(client.c:2063:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1470757409/real 1470757409]  req@ffff881f9afbce00 x1541675108443980/t0(0) o10-&amp;gt;lcy-OST0001-osc-ffff882025188000@10.1.1.172@o2ib9:6/4 lens 560/432 e 3 to 1 dl 1470757575 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
2016-08-09 08:46:15 [570411.180184] Lustre: 130984:0:(client.c:2063:ptlrpc_expire_one_request()) Skipped 22 previous similar messages
2016-08-09 08:46:15 [570411.195473] LustreError: 130984:0:(niobuf.c:732:ptl_send_rpc()) @@@ xid: 1541675108443980, replied: 1541675108452731, list_empty:0
2016-08-09 08:46:15 [570411.195473]   req@ffff881f9afbce00 x1541675108443980/t0(0) o10-&amp;gt;lcy-OST0001-osc-ffff882025188000@10.1.1.172@o2ib9:6/4 lens 560/432 e 3 to 1 dl 1470757575 ref 1 fl Rpc:XS/2/ffffffff rc -11/-1
2016-08-09 08:46:15 [570411.238718] LustreError: 130984:0:(niobuf.c:733:ptl_send_rpc()) LBUG
2016-08-09 08:46:15 [570411.248645] Pid: 130984, comm: ptlrpcd_07_05
2016-08-09 08:46:15 Aug  9 08:51:14 opal173 kernel: [570411.256196] 
2016-08-09 08:46:15 [570411.256196] Call Trace:
2016-08-09 08:46:15 LustreError: 130984:0:(niobuf.c:733:ptl_send_rpc()) LBUG
2016-08-09 08:46:15 Aug  9[570411.266553]  [&amp;lt;ffffffffa0aea7e3&amp;gt;] libcfs_debug_dumpstack+0x53/0x80 [libcfs]
2016-08-09 08:46:15  08:46:15 opal173 kernel: [57041[570411.278045]  [&amp;lt;ffffffffa0aead85&amp;gt;] lbug_with_loc+0x45/0xc0 [libcfs]
2016-08-09 08:46:15 1.238718] LustreError: 130984:0:[570411.288199]  [&amp;lt;ffffffffa0ec853a&amp;gt;] ptl_send_rpc+0xa1a/0xdb0 [ptlrpc]
2016-08-09 08:46:15 (niobuf.c:733:ptl_send_rpc()) LB[570411.298337]  [&amp;lt;ffffffffa0ec1209&amp;gt;] ptlrpc_check_set.part.21+0x1789/0x1d80 [ptlrpc]
2016-08-09 08:46:15 UG
2016-08-09 08:46:15 [570411.309879]  [&amp;lt;ffffffffa0ec185b&amp;gt;] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
2016-08-09 08:46:16 [570411.319667]  [&amp;lt;ffffffffa0eed03b&amp;gt;] ptlrpcd_check+0x4eb/0x5e0 [ptlrpc]
2016-08-09 08:46:16 [570411.329345]  [&amp;lt;ffffffffa0eed3eb&amp;gt;] ptlrpcd+0x2bb/0x560 [ptlrpc]
2016-08-09 08:46:16 [570411.338362]  [&amp;lt;ffffffff810bd4b0&amp;gt;] ? default_wake_function+0x0/0x20
2016-08-09 08:46:16 [570411.347762]  [&amp;lt;ffffffffa0eed130&amp;gt;] ? ptlrpcd+0x0/0x560 [ptlrpc]
2016-08-09 08:46:16 [570411.356709]  [&amp;lt;ffffffff810a997f&amp;gt;] kthread+0xcf/0xe0
2016-08-09 08:46:16 [570411.364548]  [&amp;lt;ffffffff810a98b0&amp;gt;] ? kthread+0x0/0xe0
2016-08-09 08:46:16 [570411.372443]  [&amp;lt;ffffffff8165d658&amp;gt;] ret_from_fork+0x58/0x90
2016-08-09 08:46:16 [570411.380778]  [&amp;lt;ffffffff810a98b0&amp;gt;] ? kthread+0x0/0xe0
2016-08-09 08:46:16 [570411.388582] 
2016-08-09 08:46:16 [570411.392834] Kernel panic - not syncing: LBUG
2016-08-09 08:46:16 [570411.399778] CPU: 33 PID: 130984 Comm: ptlrpcd_07_05 Tainted: P           OE  ------------   3.10.0-327.22.2.1chaos.ch6.x86_64 #1
2016-08-09 08:46:16 [570411.416933] Hardware name: Penguin Computing Relion OCP1930e/S2600KPR, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
2016-08-09 08:46:16 [570411.431470]  ffffffffa0b06e0f 000000003006f545 ffff880fb772bb48 ffffffff8164c6b4
2016-08-09 08:46:16 [570411.441962]  ffff880fb772bbc8 ffffffff816456af ffffffff00000008 ffff880fb772bbd8
2016-08-09 08:46:16 [570411.452453]  ffff880fb772bb78 000000003006f545 ffffffffa0f5a675 0000000000000246
2016-08-09 08:46:16 [570411.462912] Call Trace:
2016-08-09 08:46:16 [570411.467750]  [&amp;lt;ffffffff8164c6b4&amp;gt;] dump_stack+0x19/0x1b
2016-08-09 08:46:16 [570411.475583]  [&amp;lt;ffffffff816456af&amp;gt;] panic+0xd8/0x1e7
2016-08-09 08:46:16 [570411.482985]  [&amp;lt;ffffffffa0aeadeb&amp;gt;] lbug_with_loc+0xab/0xc0 [libcfs]
2016-08-09 08:46:16 [570411.491940]  [&amp;lt;ffffffffa0ec853a&amp;gt;] ptl_send_rpc+0xa1a/0xdb0 [ptlrpc]
2016-08-09 08:46:16 [570411.500942]  [&amp;lt;ffffffffa0ec1209&amp;gt;] ptlrpc_check_set.part.21+0x1789/0x1d80 [ptlrpc]
2016-08-09 08:46:16 [570411.511280]  [&amp;lt;ffffffffa0ec185b&amp;gt;] ptlrpc_check_set+0x5b/0xe0 [ptlrpc]
2016-08-09 08:46:16 [570411.520423]  [&amp;lt;ffffffffa0eed03b&amp;gt;] ptlrpcd_check+0x4eb/0x5e0 [ptlrpc]
2016-08-09 08:46:16 [570411.529433]  [&amp;lt;ffffffffa0eed3eb&amp;gt;] ptlrpcd+0x2bb/0x560 [ptlrpc]
2016-08-09 08:46:16 [570411.537780]  [&amp;lt;ffffffff810bd4b0&amp;gt;] ? wake_up_state+0x20/0x20
2016-08-09 08:46:16 [570411.545821]  [&amp;lt;ffffffffa0eed130&amp;gt;] ? ptlrpcd_check+0x5e0/0x5e0 [ptlrpc]
2016-08-09 08:46:16 [570411.554875]  [&amp;lt;ffffffff810a997f&amp;gt;] kthread+0xcf/0xe0
2016-08-09 08:46:16 [570411.562041]  [&amp;lt;ffffffff810a98b0&amp;gt;] ? kthread_create_on_node+0x140/0x140
2016-08-09 08:46:16 [570411.571021]  [&amp;lt;ffffffff8165d658&amp;gt;] ret_from_fork+0x58/0x90
2016-08-09 08:46:16 [570411.578700]  [&amp;lt;ffffffff810a98b0&amp;gt;] ? kthread_create_on_node+0x140/0x140
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="38740">LU-8488</key>
            <summary>LBUG in ptl_send_rpc(), request-&gt;rq_xid &lt;= imp-&gt;imp_known_replied_xid fails</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Tue, 9 Aug 2016 19:09:18 +0000</created>
                <updated>Wed, 20 Dec 2017 01:45:36 +0000</updated>
                            <resolved>Mon, 18 Dec 2017 14:05:21 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                                        <due></due>
                            <votes>1</votes>
                                    <watches>15</watches>
                                                                            <comments>
                            <comment id="161461" author="pjones" created="Wed, 10 Aug 2016 18:39:24 +0000"  >&lt;p&gt;Alex&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="162151" author="kilian" created="Wed, 17 Aug 2016 02:49:50 +0000"  >&lt;p&gt;We (Stanford Research Computing) have seen a very similar LBUG on a client (a single occurence so far):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Aug 15 20:55:06 sherlock-ln03 kernel: Lustre: regal-OST000a-osc-ffff880332e3b000: Connection restored to regal-OST000a (at 10.210.34.203@o2ib1)
Aug 15 20:55:06 sherlock-ln03 kernel: Lustre: Skipped 1 previous similar message
Aug 15 20:55:06 sherlock-ln03 kernel: LustreError: 20432:0:(niobuf.c:737:ptl_send_rpc()) @@@ xid: 1542242077009292, replied: 1542242077009307, list_empty:0
Aug 15 20:55:06 sherlock-ln03 kernel:  req@ffff880ed871c3c0 x1542242077009292/t0(0) o4-&amp;gt;regal-OST004f-osc-ffff880332e3b000@10.210.34.216@o2ib1:6/4 lens 488/448 e 0 to 0 dl 0 ref 2 fl Rpc:/0/ffffffff rc 0/-1
Aug 15 20:55:06 sherlock-ln03 kernel: LustreError: 20432:0:(niobuf.c:738:ptl_send_rpc()) LBUG
Aug 15 20:55:06 sherlock-ln03 kernel: Pid: 20432, comm: ptlrpcd_00_06
Aug 15 20:55:06 sherlock-ln03 kernel:
Aug 15 20:55:06 sherlock-ln03 kernel: Call Trace:
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa03d6895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa03d6e97&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07c1498&amp;gt;] ptl_send_rpc+0x408/0xe80 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07f80c4&amp;gt;] ? sptlrpc_req_refresh_ctx+0x154/0x910 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07c17d5&amp;gt;] ? ptl_send_rpc+0x745/0xe80 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07b5de1&amp;gt;] ptlrpc_send_new_req+0x511/0x9b0 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07ba280&amp;gt;] ptlrpc_check_set+0xa20/0x1ca0 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffff8100ba4e&amp;gt;] ? common_interrupt+0xe/0x13
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07e8633&amp;gt;] ptlrpcd_check+0x3d3/0x610 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07b0416&amp;gt;] ? ptlrpc_set_next_timeout+0xc6/0x1a0 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07e8ae2&amp;gt;] ptlrpcd+0x272/0x4f0 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffff810672b0&amp;gt;] ? default_wake_function+0x0/0x20
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffffa07e8870&amp;gt;] ? ptlrpcd+0x0/0x4f0 [ptlrpc]
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffff810a0fce&amp;gt;] kthread+0x9e/0xc0
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffff8100c28a&amp;gt;] child_rip+0xa/0x20
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffff810a0f30&amp;gt;] ? kthread+0x0/0xc0
Aug 15 20:55:06 sherlock-ln03 kernel: [&amp;lt;ffffffff8100c280&amp;gt;] ? child_rip+0x0/0x20
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;That&apos;s using IEEL 3.0.0.0&lt;/p&gt;

&lt;p&gt;Cheers,&lt;br/&gt;
&amp;#8211; &lt;br/&gt;
Kilian&lt;/p&gt;</comment>
                            <comment id="167082" author="simmonsja" created="Fri, 23 Sep 2016 17:08:28 +0000"  >&lt;p&gt;We just hit it as well. If you need a dump we can provide it.&lt;/p&gt;</comment>
                            <comment id="169262" author="yujian" created="Wed, 12 Oct 2016 12:39:22 +0000"  >&lt;p&gt;Hi Alex,&lt;/p&gt;

&lt;p&gt;Could you please advise? Thank you.&lt;/p&gt;</comment>
                            <comment id="169424" author="parinay" created="Thu, 13 Oct 2016 11:58:23 +0000"  >&lt;p&gt;We have noticed similar crash&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: 47121:0:(client.c:2093:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1472152273/real 1472152273]  req@ffff8813bd6e0cc0 x1543375481248976/t0(0) o3-&amp;gt;lustre02-OST0006-osc-ffff8813c2ad6400@10.253.108.30@tcp:6/4 lens 488/432 e 0 to 1 dl 1472152289 ref 2 fl Rpc:X/0/ffffffff rc 0/-1
Lustre: lustre02-OST0006-osc-ffff8813c2ad6400: Connection to lustre02-OST0006 (at 10.253.108.30@tcp) was lost; in progress operations using this service will wait for recovery to complete
Lustre: 47110:0:(client.c:2093:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1472152290/real 1472152290]  req@ffff8805b92893c0 x1543375481250220/t0(0) o8-&amp;gt;lustre02-OST0006-osc-ffff8813c2ad6400@10.253.108.30@tcp:28/4 lens 520/544 e 0 to 1 dl 1472152310 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: lustre02-OST0006-osc-ffff8813c2ad6400: Connection restored to 10.253.108.31@tcp (at 10.253.108.31@tcp)
LustreError: 62105:0:(niobuf.c:750:ptl_send_rpc()) @@@ xid: 1543375481425516, replied: 1543375481435211, list_empty:0
  req@ffff8805f42e43c0 x1543375481425516/t0(0) o13-&amp;gt;lustre02-OST0006-osc-ffff8813c2ad6400@10.253.108.31@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1 
LustreError: 62094:0:(niobuf.c:751:ptl_send_rpc()) LBUG 
Pid: 62094, comm: ssexec
LustreError: 62099:0:(niobuf.c:751:ptl_send_rpc()) LBUG 

Call Trace:
LustreError: 62088:0:(niobuf.c:751:ptl_send_rpc()) LBUG 
Pid: 62099, comm: ssexec
Pid: 62088, comm: ssexec

Call Trace:

Call Trace:
 [&amp;lt;ffffffffa147f895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [&amp;lt;ffffffffa147f895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [&amp;lt;ffffffffa1480007&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
 [&amp;lt;ffffffffa147f895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [&amp;lt;ffffffffa1480007&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
 [&amp;lt;ffffffffa1480007&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
 [&amp;lt;ffffffffa17a9478&amp;gt;] ptl_send_rpc+0x3f8/0xe70 [ptlrpc]
 [&amp;lt;ffffffffa17a9478&amp;gt;] ptl_send_rpc+0x3f8/0xe70 [ptlrpc]
 [&amp;lt;ffffffffa17a9478&amp;gt;] ptl_send_rpc+0x3f8/0xe70 [ptlrpc]
 [&amp;lt;ffffffffa17e06a4&amp;gt;] ? sptlrpc_req_refresh_ctx+0x154/0x910 [ptlrpc]
 [&amp;lt;ffffffffa17e06a4&amp;gt;] ? sptlrpc_req_refresh_ctx+0x154/0x910 [ptlrpc]
 [&amp;lt;ffffffffa17e06a4&amp;gt;] ? sptlrpc_req_refresh_ctx+0x154/0x910 [ptlrpc]
LustreError: 62086:0:(niobuf.c:751:ptl_send_rpc()) LBUG 
Pid: 62086, comm: ssexec

Call Trace:
 [&amp;lt;ffffffffa17a313c&amp;gt;] ptlrpc_check_set+0x17cc/0x1d90 [ptlrpc]
 [&amp;lt;ffffffffa17a313c&amp;gt;] ptlrpc_check_set+0x17cc/0x1d90 [ptlrpc]
 [&amp;lt;ffffffffa147f895&amp;gt;] libcfs_debug_dumpstack+0x55/0x80 [libcfs]
 [&amp;lt;ffffffffa1480007&amp;gt;] lbug_with_loc+0x47/0xb0 [libcfs]
 [&amp;lt;ffffffffa17a313c&amp;gt;] ptlrpc_check_set+0x17cc/0x1d90 [ptlrpc]
 [&amp;lt;ffffffffa17a39ea&amp;gt;] ptlrpc_set_wait+0x2ea/0x9c0 [ptlrpc]
 [&amp;lt;ffffffffa17a39ea&amp;gt;] ptlrpc_set_wait+0x2ea/0x9c0 [ptlrpc]
 [&amp;lt;ffffffffa17a9478&amp;gt;] ptl_send_rpc+0x3f8/0xe70 [ptlrpc]
 [&amp;lt;ffffffffa17a39ea&amp;gt;] ptlrpc_set_wait+0x2ea/0x9c0 [ptlrpc]
 [&amp;lt;ffffffffa1798c20&amp;gt;] ? ptlrpc_interrupted_set+0x0/0x110 [ptlrpc]
 [&amp;lt;ffffffffa1798c20&amp;gt;] ? ptlrpc_interrupted_set+0x0/0x110 [ptlrpc]
 [&amp;lt;ffffffff81065df0&amp;gt;] ? default_wake_function+0x0/0x20
 [&amp;lt;ffffffff81065df0&amp;gt;] ? default_wake_function+0x0/0x20
 [&amp;lt;ffffffffa1798c20&amp;gt;] ? ptlrpc_interrupted_set+0x0/0x110 [ptlrpc]
 [&amp;lt;ffffffffa17e06a4&amp;gt;] ? sptlrpc_req_refresh_ctx+0x154/0x910 [ptlrpc]
 [&amp;lt;ffffffff81065df0&amp;gt;] ? default_wake_function+0x0/0x20
 [&amp;lt;ffffffffa1a997d4&amp;gt;] obd_statfs_rqset+0x144/0x560 [lustre]
 [&amp;lt;ffffffffa1a997d4&amp;gt;] obd_statfs_rqset+0x144/0x560 [lustre]
 [&amp;lt;ffffffffa1a9b51b&amp;gt;] ll_statfs_internal+0x2bb/0x7f0 [lustre]
 [&amp;lt;ffffffffa1a9b51b&amp;gt;] ll_statfs_internal+0x2bb/0x7f0 [lustre]
 [&amp;lt;ffffffffa17a313c&amp;gt;] ptlrpc_check_set+0x17cc/0x1d90 [ptlrpc]
 [&amp;lt;ffffffffa1a997d4&amp;gt;] obd_statfs_rqset+0x144/0x560 [lustre]
 [&amp;lt;ffffffffa1a9bae5&amp;gt;] ll_statfs+0x95/0x190 [lustre]
 [&amp;lt;ffffffffa1a9bae5&amp;gt;] ll_statfs+0x95/0x190 [lustre]
 [&amp;lt;ffffffff811bcdd4&amp;gt;] statfs_by_dentry+0x74/0xa0
 [&amp;lt;ffffffff811bcdd4&amp;gt;] statfs_by_dentry+0x74/0xa0
 [&amp;lt;ffffffff811bcf0b&amp;gt;] vfs_statfs+0x1b/0xb0
 [&amp;lt;ffffffff811bcf0b&amp;gt;] vfs_statfs+0x1b/0xb0
 [&amp;lt;ffffffffa1a9b51b&amp;gt;] ll_statfs_internal+0x2bb/0x7f0 [lustre]
 [&amp;lt;ffffffff811bd107&amp;gt;] user_statfs+0x47/0xb0
 [&amp;lt;ffffffff811bd107&amp;gt;] user_statfs+0x47/0xb0
 [&amp;lt;ffffffffa17a39ea&amp;gt;] ptlrpc_set_wait+0x2ea/0x9c0 [ptlrpc]
 [&amp;lt;ffffffff811bd20a&amp;gt;] sys_statfs+0x2a/0x50
 [&amp;lt;ffffffff811bd20a&amp;gt;] sys_statfs+0x2a/0x50
 [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b
 [&amp;lt;ffffffff8100b072&amp;gt;] system_call_fastpath+0x16/0x1b

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The code&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;745 &amp;gt;       &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (list_empty(&amp;amp;request-&amp;gt;rq_unreplied_list) ||
746 &amp;gt;           request-&amp;gt;rq_xid &amp;lt;= imp-&amp;gt;imp_known_replied_xid) {
747 &amp;gt;       &amp;gt;       DEBUG_REQ(D_ERROR, request, &lt;span class=&quot;code-quote&quot;&gt;&quot;xid: &quot;&lt;/span&gt;LPU64&lt;span class=&quot;code-quote&quot;&gt;&quot;, replied: &quot;&lt;/span&gt;LPU64&lt;span class=&quot;code-quote&quot;&gt;&quot;, &quot;&lt;/span&gt;
748 &amp;gt;       &amp;gt;       &amp;gt;         &lt;span class=&quot;code-quote&quot;&gt;&quot;list_empty:%d\n&quot;&lt;/span&gt;, request-&amp;gt;rq_xid,
749 &amp;gt;       &amp;gt;       &amp;gt;         imp-&amp;gt;imp_known_replied_xid,
750 &amp;gt;       &amp;gt;       &amp;gt;         list_empty(&amp;amp;request-&amp;gt;rq_unreplied_list));
751 &amp;gt;       &amp;gt;       LBUG();
752 &amp;gt;       }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;from lustre logs&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;req@ffff881eaf9e90c0 x1543375481252856/t0(0) o13-&amp;gt;lustre02-OST0006-osc-ffff8813c2ad6400@10.253.108.31@tcp:7/4 lens 224/368 e 0 to 0 dl 0 ref 1 fl Rpc:/0/ffffffff rc 0/-1
 00000100:00020000:4.0F:1472152509.139389:0:62099:0:(niobuf.c:750:ptl_send_rpc()) @@@ xid: 1543375481269472, replied: 1543375481435211, list_empty:0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;from crash&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; ptlrpc_request ffff881eaf9e90c0 | grep rq_xid
  rq_xid = 1543375481252856, 
crash&amp;gt;

crash&amp;gt; ptlrpc_request ffff881eaf9e90c0 | grep rq_import
  rq_import = 0xffff881a34ef2000, 
crash&amp;gt; obd_import  0xffff881a34ef2000 | grep imp_known_replied_xid
  imp_known_replied_xid = 1543375481435211, 
crash&amp;gt; 

crash&amp;gt; ptlrpc_request ffff881eaf9e90c0 | grep -A 3 cr_unreplied_list
      cr_unreplied_list = {
        next = 0xffff880e1f320b48, 
        prev = 0xffff881f38437548
      },
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;The issue was seen during compilebench runs from lustre/tests ( 1 out of 100 runs (87the run failed))&lt;/p&gt;

&lt;p&gt;JFYI I re-ran the tests with &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3534&quot; title=&quot;async update cross-MDTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3534&quot;&gt;&lt;del&gt;LU-3534&lt;/del&gt;&lt;/a&gt; &lt;a href=&quot;http://review.whamcloud.com/#/c/15421/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;patch&lt;/a&gt; and was unable to reproduce the  failure seen here. I am not sure if thats the fix.&lt;/p&gt;</comment>
                            <comment id="169483" author="simmonsja" created="Thu, 13 Oct 2016 17:20:09 +0000"  >&lt;p&gt;No it does fix this. We are running 2.8.0 with this patch already merged and still saw this LBUG.&lt;/p&gt;</comment>
                            <comment id="169500" author="parinay" created="Thu, 13 Oct 2016 18:04:20 +0000"  >&lt;p&gt;James,&lt;br/&gt;
&amp;gt; No it does fix this. We are running 2.8.0 with this patch already merged and still saw this LBUG&lt;/p&gt;

&lt;p&gt;To understand this clearly,&lt;br/&gt;
the patch of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3534&quot; title=&quot;async update cross-MDTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3534&quot;&gt;&lt;del&gt;LU-3534&lt;/del&gt;&lt;/a&gt; helps fix the LBUG  seen here ? OR&lt;br/&gt;
even with the patch of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3534&quot; title=&quot;async update cross-MDTs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3534&quot;&gt;&lt;del&gt;LU-3534&lt;/del&gt;&lt;/a&gt; the LBUG reported here is still reproducible ?&lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="169596" author="niu" created="Fri, 14 Oct 2016 03:38:23 +0000"  >&lt;p&gt;All these failures happened on the request to old OST servers, so I think It&apos;s highly likely related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8193&quot; title=&quot;request mbits isn&amp;#39;t set properly for EINPROGRESS resend&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8193&quot;&gt;&lt;del&gt;LU-8193&lt;/del&gt;&lt;/a&gt;, before the fix of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8193&quot; title=&quot;request mbits isn&amp;#39;t set properly for EINPROGRESS resend&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8193&quot;&gt;&lt;del&gt;LU-8193&lt;/del&gt;&lt;/a&gt;, bulk resend will change request&apos;s rq_xid on unreplied list directly, that breaks the sanity check on known replied xid.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6808&quot; title=&quot;Interop 2.5.3&amp;lt;-&amp;gt;master sanity test_224c: Bulk IO write error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6808&quot;&gt;&lt;del&gt;LU-6808&lt;/del&gt;&lt;/a&gt; also has some fixes related to this (bulk request, connect to old server).&lt;/p&gt;</comment>
                            <comment id="212793" author="pjones" created="Fri, 3 Nov 2017 20:19:24 +0000"  >&lt;p&gt;LLNL have flagged this as important. Is the situation the same that this LBUG is only seen with 2.5.x servers and 2.8.x clients? And this happens pretty quickly in such a configuration?&lt;/p&gt;</comment>
                            <comment id="212924" author="ofaaland" created="Mon, 6 Nov 2017 21:59:51 +0000"  >&lt;p&gt;Recently, we have seen this with 2.5.x servers and 2.8.x clients.&#160; Our general file system tests produced it a couple of times/day, but I do not know the volume of file system tests that ran before it appears.&#160; They are a fairly small subset of our total test suite.&lt;/p&gt;</comment>
                            <comment id="213199" author="bzzz" created="Thu, 9 Nov 2017 05:00:14 +0000"  >&lt;p&gt;unfortunately I can&apos;t access lustre-release-fe-llnl repo, could someone please make a tarball and attach please?&lt;/p&gt;</comment>
                            <comment id="213222" author="pjones" created="Thu, 9 Nov 2017 13:34:07 +0000"  >&lt;p&gt;Alex&lt;/p&gt;

&lt;p&gt;Please connect with Minh to get the necessary permissions&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="213311" author="bzzz" created="Fri, 10 Nov 2017 06:57:39 +0000"  >&lt;p&gt;I&apos;ve ported &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6808&quot; title=&quot;Interop 2.5.3&amp;lt;-&amp;gt;master sanity test_224c: Bulk IO write error&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6808&quot;&gt;&lt;del&gt;LU-6808&lt;/del&gt;&lt;/a&gt;: &lt;a href=&quot;https://review.whamcloud.com/#/c/30018/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/30018/&lt;/a&gt; but still waiting for confirmation from autotest.&lt;/p&gt;</comment>
                            <comment id="213861" author="ofaaland" created="Thu, 16 Nov 2017 00:10:23 +0000"  >&lt;p&gt;Alex,&lt;/p&gt;

&lt;p&gt;It looks like something happened in the autotest system and you need to kick off testing again.&#160; There appear to be no test results.&lt;/p&gt;</comment>
                            <comment id="216578" author="pjones" created="Mon, 18 Dec 2017 14:05:21 +0000"  >&lt;p&gt;Patch ported and landed to 2.8 FE branch&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzyk87:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>