<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:09:56 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7558] niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp; MSGHDR_AT_SUPPORT ...)</title>
                <link>https://jira.whamcloud.com/browse/LU-7558</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;/bgsys/logs/BGQ.sn/R04-ID-J00.log (among many others)&lt;/p&gt;

&lt;p&gt;LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) ASSERTION( (at_max == 0) || request-&amp;gt;rq_import-&amp;gt;imp_state != LUSTRE_IMP_FULL || (request-&amp;gt;rq_import-&amp;gt;imp_msghdr_flags &amp;amp; 0x1) || ! (request-&amp;gt;rq_import-&amp;gt;imp_connect_data.ocd_connect_flags &amp;amp; 0x1000000ULL) ) failed:&lt;br/&gt;
LustreError: 28558:0: (niobuf.c:721:ptl_send_rpc()) LBUG&lt;br/&gt;
Call Trace:&lt;br/&gt;
show_stack&lt;br/&gt;
libcfs_debug_dumpstack&lt;br/&gt;
lbug_with_loc&lt;br/&gt;
ptl_send_rpc&lt;br/&gt;
ptlrpc_send_new_req&lt;br/&gt;
ptlrpc_set_wait&lt;br/&gt;
ll_statfs_internal&lt;br/&gt;
ll_statfs&lt;br/&gt;
statfs_by_dentry&lt;br/&gt;
vfs_statfs&lt;br/&gt;
user_statfs&lt;br/&gt;
SyS_statfs&lt;br/&gt;
syscall_exit&lt;/p&gt;

&lt;p&gt;Occurred on many tens of I/O nodes, then within the next 24 hours, occurred on many tens more.  Continuing to occur.&lt;/p&gt;

&lt;p&gt;We have not seen this issue before.  The patch that introduced this assert was in the patch stack for our tag 2.5.4-1chaos, rolled out in April.   We do not know what triggered this now.&lt;/p&gt;

&lt;p&gt;c389652 &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5528&quot; title=&quot;Race - connect vs resend&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5528&quot;&gt;&lt;del&gt;LU-5528&lt;/del&gt;&lt;/a&gt; ptlrpc: fix race between connect vs resend&lt;/p&gt;

&lt;p&gt;There are no crash dumps for these nodes, nor much in the console logs.&lt;/p&gt;

&lt;p&gt;Because several conditions were ASSERTed in a single statement, which failed is unknown.&lt;/p&gt;</description>
                <environment>BG/Q I/O nodes&lt;br/&gt;
lustre-client-ion-2.5.4-4chaos_2.6.32_504.8.2.bgq.3blueos.V1R2M3.bl2.2_1.ppc64.ppc64&lt;br/&gt;
</environment>
        <key id="33707">LU-7558</key>
            <summary>niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp; MSGHDR_AT_SUPPORT ...)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="tappro">Mikhail Pershin</assignee>
                                    <reporter username="ofaaland">Olaf Faaland</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Tue, 15 Dec 2015 17:48:32 +0000</created>
                <updated>Mon, 8 Apr 2019 14:07:59 +0000</updated>
                            <resolved>Wed, 16 Jan 2019 13:06:15 +0000</resolved>
                                    <version>Lustre 2.12.0</version>
                                    <fixVersion>Lustre 2.9.0</fixVersion>
                    <fixVersion>Lustre 2.13.0</fixVersion>
                    <fixVersion>Lustre 2.10.7</fixVersion>
                    <fixVersion>Lustre 2.12.1</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>14</watches>
                                                                            <comments>
                            <comment id="136381" author="ofaaland" created="Tue, 15 Dec 2015 17:49:38 +0000"  >&lt;p&gt;Patch stack running on those nodes:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;* ae35897 (tag: 2.5.4-4chaos) LU-2232 debug: print debug &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; prolonged lock
* 458c1f9 Revert &lt;span class=&quot;code-quote&quot;&gt;&quot;LU-2232 debug: lock handle in IO request may be stale&quot;&lt;/span&gt;
* f69e69e (tag: 2.5.4-3chaos) LU-6389 llite: restart &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt; read/write &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; normal IO
* 1e8f823 Revert &lt;span class=&quot;code-quote&quot;&gt;&quot;LU-6389 llite: restart &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt; read/write &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; normal IO&quot;&lt;/span&gt;
* 3028245 (tag: 2.5.4-2chaos) LU-6152 osd-zfs: ZFS large block compat
* fb6b94c Revert &lt;span class=&quot;code-quote&quot;&gt;&quot;LU-5053 ptlrpc: Add schedule point to ptlrpc_check_set()&quot;&lt;/span&gt;
* 0dc7041 Fix ldiskfs source autodetect &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; CentOS 6
* 42edddb LU-6536 llapi: lmm_stripe_count used unswabbed
* 3d25e65 LU-2182 llapi: implementation of &lt;span class=&quot;code-keyword&quot;&gt;new&lt;/span&gt; llapi_layout API
* fe5c3dd LLNL-0000 llapi: get OST count from proc
* c1c8672 LU-4107 build: fix lustre_user.h to C++ compatible
* f139e17 LU-5042 ldlm: delay filling resource&apos;s LVB upon replay
* 35e5a78 (tag: 2.5.4-1chaos) LU-6389 llite: restart &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt; read/write &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; normal IO
* a25d1d1 LU-1032 build: Honor --disable-modules option in spec file
* 01df569 LU-1032 build: Add Lustre DKMS spec file
* 1f47ba6 LU-6038 osd-zfs: sa_spill_alloc()/sa_spill_free() compat
* 16b79c6 LU-6038 osd-zfs: Avoid redefining KM_SLEEP
* 36eca32 LU-5326 libcfs: remove umode_t typedef
* 685c9ac LU-3353 ptlrpc: Suppress error message when imp_sec is freed
* 9ea4c83 LU-5984 obd: fix lastid file name in compat code
* c389652 LU-5528 ptlrpc: fix race between connect vs resend
* bba812b LU-5579 ldlm: re-sent enqueue vs lock destroy race
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="136382" author="ofaaland" created="Tue, 15 Dec 2015 17:54:19 +0000"  >&lt;p&gt;The console logs indicate a pair of OSS&apos;s (failover pair, in fact) had crashed and come back up.  This error occured when they had timed out of recovery and evicted the clients.  There&apos;s nothing else in the console log.  The nodes are unresponsive after they hit this LBUG, so I&apos;m not able to gather more information unless I can reproduce the situation while monitoring the node.  Not clear right now how to do that without affecting the whole machine.&lt;/p&gt;

&lt;p&gt;Seems wrong to ASSERT anything to do with connection state when not holding a lock on the import.  Also not helpful to ASSERT several conditions collectively instead of individually so the bad state is clearer.  So I&apos;m looking into changing to code to return failure instead.&lt;/p&gt;</comment>
                            <comment id="136386" author="ofaaland" created="Tue, 15 Dec 2015 18:19:43 +0000"  >&lt;p&gt;My statement about ASSERTing collectively is mistaken - OR not AND.  But this still doesn&apos;t produce enough useful information, the patch will need to change that.&lt;/p&gt;

&lt;p&gt;I don&apos;t understand why they chose to ASSERT here instead of fail.  There is discussion of this ASSERT in code review from the patch, but I don&apos;t see a rationale for crashing the node.&lt;/p&gt;</comment>
                            <comment id="136390" author="pjones" created="Tue, 15 Dec 2015 18:33:18 +0000"  >&lt;p&gt;Mike&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="136398" author="morrone" created="Tue, 15 Dec 2015 18:56:37 +0000"  >&lt;p&gt;This bug is taking out a large number of the I/O Nodes on Sequoia, our largest production supercomputer, so I bumped the severity to 1.&lt;/p&gt;</comment>
                            <comment id="136418" author="pjones" created="Tue, 15 Dec 2015 19:38:34 +0000"  >&lt;p&gt;Chris&lt;/p&gt;

&lt;p&gt;Just to acknowledge this - I am discussing with engineering&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="136426" author="jhammond" created="Tue, 15 Dec 2015 20:10:07 +0000"  >&lt;p&gt;Is 2.5.4-15chaos available somewhere? I do not see it on &lt;a href=&quot;https://github.com/chaos/lustre&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/chaos/lustre&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="136427" author="pjones" created="Tue, 15 Dec 2015 20:12:10 +0000"  >&lt;p&gt;John&lt;/p&gt;

&lt;p&gt;I can direct you to it&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="136428" author="pjones" created="Tue, 15 Dec 2015 20:22:47 +0000"  >&lt;p&gt;Olaf&lt;/p&gt;

&lt;p&gt;When did the last software update get applied to these nodes and which tag preceded it? Have any changes to the configuration been made in recent weeks?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="136434" author="morrone" created="Tue, 15 Dec 2015 21:23:19 +0000"  >&lt;p&gt;The nodes are only running 2.5.4-4chaos, the newer tags haven&apos;t made it to the BG/Q systems.&lt;/p&gt;</comment>
                            <comment id="136435" author="morrone" created="Tue, 15 Dec 2015 21:24:01 +0000"  >&lt;p&gt;No software changes have happened on these BG/Q clients in months.  No recent change on the servers either.&lt;/p&gt;</comment>
                            <comment id="136437" author="ofaaland" created="Tue, 15 Dec 2015 21:29:58 +0000"  >&lt;p&gt;Servers are at lustre-2.5.4-13chaos&lt;/p&gt;</comment>
                            <comment id="136521" author="hongchao.zhang" created="Wed, 16 Dec 2015 11:52:17 +0000"  >&lt;p&gt;The possible case (but I wan&apos;t sure about it!) could cause this problem is the race between setting the state of the obd_import&lt;/p&gt;

&lt;p&gt;in &quot;ptlrpc_connect_import&quot;, the &quot;imp_state&quot; is set as &quot;LUSTRE_IMP_CONNECTING&quot;, and if the check of the &quot;imp_state&quot;&lt;br/&gt;
in &quot;ptlrpc_import_recovery_state_machine&quot; has been passed before this setting in &quot;ptlrpc_connect_import&quot;, the &quot;imp_state&quot;&lt;br/&gt;
could be set as &quot;LUSTRE_IMP_FULL&quot; by the delayed reply of  last ping of recovery in &quot;signal_completed_replay&quot;.&lt;/p&gt;

&lt;p&gt;the race is,&lt;br/&gt;
thread 1:  ptlrpc_import_recovery_state_machine&lt;br/&gt;
          ...&lt;br/&gt;
          if (imp-&amp;gt;imp_state == LUSTRE_IMP_RECOVER) {&lt;br/&gt;
                struct ptlrpc_connection *conn = imp-&amp;gt;imp_connection;&lt;/p&gt;

&lt;p&gt;                rc = ptlrpc_resend(imp);&lt;br/&gt;
                if (rc)&lt;br/&gt;
                        GOTO(out, rc);&lt;/p&gt;

&lt;p&gt;                IMPORT_SET_STATE(imp, LUSTRE_IMP_FULL);&lt;br/&gt;
                ptlrpc_activate_import(imp);&lt;br/&gt;
         ...&lt;br/&gt;
thread2:  ptlrpc_connect_import is called during calling ptlrpc_resend, and the imp_state is set LUSTRE_IMP_CONNECTING and&lt;br/&gt;
               MSGHDR_AT_SUPPORT is cleared.&lt;/p&gt;

&lt;p&gt;thread1: the imp_state is set LUSTRE_IMP_FULL, and the LBUG is triggered.&lt;/p&gt;</comment>
                            <comment id="136546" author="tappro" created="Wed, 16 Dec 2015 15:53:16 +0000"  >&lt;p&gt;this assertion was added by patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5528&quot; title=&quot;Race - connect vs resend&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5528&quot;&gt;&lt;del&gt;LU-5528&lt;/del&gt;&lt;/a&gt; ( c389652 &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5528&quot; title=&quot;Race - connect vs resend&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5528&quot;&gt;&lt;del&gt;LU-5528&lt;/del&gt;&lt;/a&gt; ptlrpc: fix race between connect vs resend ) so this situation is result of that commit. So either that patch is incomplete or assertion is not correct.&lt;/p&gt;</comment>
                            <comment id="136558" author="ofaaland" created="Wed, 16 Dec 2015 16:24:43 +0000"  >&lt;p&gt;What is special about LUSTRE_IMP_FULL?  Why would the assertion check for that state, but none of the other import states like _EVICTED or _CLOSED?&lt;/p&gt;

&lt;p&gt;Also, there are many locations in the code within ptl_send_rpc() where something can cause the function to return an error.  Do you know why the person who wrote the patch chose to ASSERT instead of return an error on import state?&lt;/p&gt;

&lt;p&gt;thanks&lt;/p&gt;</comment>
                            <comment id="136563" author="tappro" created="Wed, 16 Dec 2015 16:41:48 +0000"  >&lt;p&gt;Olaf, it looks for me like assertion was added to prevent similar situations as &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5528&quot; title=&quot;Race - connect vs resend&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5528&quot;&gt;&lt;del&gt;LU-5528&lt;/del&gt;&lt;/a&gt; fixed, but I agree that error handling with a message about details is better. Is it possible to revert this patch for now? We are going to check the correctness of that assertion and replace it with error handling instead.&lt;/p&gt;</comment>
                            <comment id="136588" author="ofaaland" created="Wed, 16 Dec 2015 17:54:23 +0000"  >&lt;p&gt;Mikhail, yes, we could revert the patch.  I need to talk it over with Chris and the BG team.   It&apos;s trickier than it normally would be, because lots of people are out of the office next week and so it&apos;s difficult to handle anything new that comes up.&lt;/p&gt;

&lt;p&gt;Should we add any debug code?&lt;/p&gt;</comment>
                            <comment id="136602" author="tappro" created="Wed, 16 Dec 2015 19:30:37 +0000"  >&lt;p&gt;Olaf, let me think a bit, I will provide patch with debug instead of assertion&lt;/p&gt;</comment>
                            <comment id="136610" author="tappro" created="Wed, 16 Dec 2015 20:11:09 +0000"  >&lt;p&gt;I agree with Hongchao, this assertion is result of race he mentioned, and that race itself was introduced by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5528&quot; title=&quot;Race - connect vs resend&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5528&quot;&gt;&lt;del&gt;LU-5528&lt;/del&gt;&lt;/a&gt; patch. It fixes the case when import flags are set after import state is set to LUSTRE_IMP_FULL but open new race when flags are set before LUSTRE_IMP_FULL. Strictly speaking, both import flags and its state should be changed together.&lt;/p&gt;</comment>
                            <comment id="136617" author="tappro" created="Wed, 16 Dec 2015 20:48:10 +0000"  >&lt;p&gt;Olaf, I think it is better to revert this commit, even if we will remove that assertion, the import will still have wrong flags causing all requests to behave wrongly, so this situation is not better than before that commit. Meanwhile I will think how to solve this properly, I don&apos;t see quick solution so far.&lt;/p&gt;

&lt;p&gt;The possible solution I have in mind is to set new import bool value imp_connected which would indicate that import was reconnected (and imp flags are updated already) even if it is not in FULL state yet. So ptlrpc_connect_import() will not try to initiate new connect by checking this flag (now it checks import state) and will not clear flags for import which is connected but not FULL yet. I am not sure this is the only solution, it has to be discussed.&lt;/p&gt;</comment>
                            <comment id="136622" author="ofaaland" created="Wed, 16 Dec 2015 21:19:15 +0000"  >&lt;p&gt;Mikhail, OK, we are reverting this patch for now.&lt;/p&gt;</comment>
                            <comment id="146954" author="morrone" created="Fri, 25 Mar 2016 18:40:39 +0000"  >&lt;p&gt;This is still broken on master and b2_8.  Since we hope to move to 2.8 in the near future, we would really like to see this fixed as soon as possible.&lt;/p&gt;</comment>
                            <comment id="147700" author="gerrit" created="Mon, 4 Apr 2016 11:03:09 +0000"  >&lt;p&gt;Mike Pershin (mike.pershin@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/19312&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19312&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7558&quot; title=&quot;niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT ...)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7558&quot;&gt;&lt;del&gt;LU-7558&lt;/del&gt;&lt;/a&gt; import: don&apos;t reconnect during connect interpret&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 02f60229e4abb25e7da20ebaebe7e7bca18e99c3&lt;/p&gt;</comment>
                            <comment id="147701" author="tappro" created="Mon, 4 Apr 2016 11:07:07 +0000"  >&lt;p&gt;the patch implements approach I mentioned in comment above. It blocks new connect requests while current connect interpret is running, preventing the import connection flags re-setting.&lt;/p&gt;</comment>
                            <comment id="150777" author="gerrit" created="Mon, 2 May 2016 23:56:23 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/19312/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/19312/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7558&quot; title=&quot;niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT ...)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7558&quot;&gt;&lt;del&gt;LU-7558&lt;/del&gt;&lt;/a&gt; import: don&apos;t reconnect during connect interpret&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: d10320bafdb942a8dbc5a8ba9176873134a5ffa3&lt;/p&gt;</comment>
                            <comment id="150865" author="jgmitter" created="Tue, 3 May 2016 17:47:22 +0000"  >&lt;p&gt;Landed to master for 2.9.0&lt;/p&gt;</comment>
                            <comment id="159387" author="shadow" created="Wed, 20 Jul 2016 19:19:38 +0000"  >&lt;p&gt;Mikhail,&lt;/p&gt;

&lt;p&gt;i think it not a very good approach. It&apos;s easy to use a simple change like &lt;br/&gt;
if (imp-&amp;gt;imp_state =! LUSTRE_IMP_DISCONN ) instead of add a new flag.&lt;/p&gt;</comment>
                            <comment id="231549" author="green" created="Mon, 6 Aug 2018 21:27:59 +0000"  >&lt;p&gt;I am still hitting this regularly in my testing.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[16176.517655] LustreError: 2151:0:(client.c:1179:ptlrpc_import_delay_req()) @@@ invalidate in flight  req@ffff8802f4d5fc80 x1608074126206960/t0(0) o38-&amp;gt;lustre-MDT0000-mdc-ffff8802acac9800@0@lo:12/10 lens 520/544 e 0 to 0 dl 0 ref 1 fl Rpc:N/0/ffffffff rc 0/-1
[16178.021121] LustreError: 2309:0:(niobuf.c:782:ptl_send_rpc()) ASSERTION( (at_max == 0) || imp-&amp;gt;imp_state != LUSTRE_IMP_FULL || (imp-&amp;gt;imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT) || !(imp-&amp;gt;imp_connect_data.ocd_connect_flags &amp;amp; 0x1000000ULL) ) failed: 
[16178.032291] LustreError: 2309:0:(niobuf.c:782:ptl_send_rpc()) LBUG
[16178.033617] CPU: 1 PID: 2309 Comm: rm Kdump: loaded Tainted: P           OE  ------------   3.10.0-7.5-debug #1
[16178.036021] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[16178.037385] Call Trace:
[16178.038659]  [&amp;lt;ffffffff8176fc9a&amp;gt;] dump_stack+0x19/0x1b
[16178.042848]  [&amp;lt;ffffffffa01b37c2&amp;gt;] libcfs_call_trace+0x72/0x80 [libcfs]
[16178.044440]  [&amp;lt;ffffffffa01b384c&amp;gt;] lbug_with_loc+0x4c/0xb0 [libcfs]
[16178.046617]  [&amp;lt;ffffffffa05854e9&amp;gt;] ptl_send_rpc+0xb79/0xe80 [ptlrpc]
[16178.047953]  [&amp;lt;ffffffffa01b9f97&amp;gt;] ? libcfs_debug_msg+0x57/0x80 [libcfs]
[16178.049215]  [&amp;lt;ffffffffa0579d40&amp;gt;] ptlrpc_send_new_req+0x460/0xa70 [ptlrpc]
[16178.050257]  [&amp;lt;ffffffffa057e819&amp;gt;] ptlrpc_set_wait+0x289/0x790 [ptlrpc]
[16178.051188]  [&amp;lt;ffffffff810630a3&amp;gt;] ? kvm_clock_read+0x33/0x40
[16178.052007]  [&amp;lt;ffffffff810630b9&amp;gt;] ? kvm_clock_get_cycles+0x9/0x10
[16178.053405]  [&amp;lt;ffffffffa058abda&amp;gt;] ? lustre_msg_set_jobid+0x9a/0x110 [ptlrpc]
[16178.054486]  [&amp;lt;ffffffffa057ed9d&amp;gt;] ptlrpc_queue_wait+0x7d/0x220 [ptlrpc]
[16178.055636]  [&amp;lt;ffffffffa08a3ca7&amp;gt;] mdc_reint+0x57/0x160 [mdc]
[16178.056599]  [&amp;lt;ffffffffa08a4df6&amp;gt;] mdc_unlink+0x206/0x450 [mdc]
[16178.057700]  [&amp;lt;ffffffffa0500125&amp;gt;] lmv_unlink+0x5a5/0x950 [lmv]
[16178.059013]  [&amp;lt;ffffffffa146c764&amp;gt;] ? ll_i2gids+0x24/0xb0 [lustre]
[16178.060357]  [&amp;lt;ffffffffa1471b31&amp;gt;] ll_unlink+0x171/0x5e0 [lustre]
[16178.061621]  [&amp;lt;ffffffff8121d326&amp;gt;] vfs_unlink+0x106/0x190
[16178.062807]  [&amp;lt;ffffffff8121ff6e&amp;gt;] do_unlinkat+0x26e/0x2b0
[16178.063937]  [&amp;lt;ffffffff8178387b&amp;gt;] ? system_call_after_swapgs+0xc8/0x160
[16178.065792]  [&amp;lt;ffffffff8178386f&amp;gt;] ? system_call_after_swapgs+0xbc/0x160
[16178.068348]  [&amp;lt;ffffffff8178387b&amp;gt;] ? system_call_after_swapgs+0xc8/0x160
[16178.069707]  [&amp;lt;ffffffff8178386f&amp;gt;] ? system_call_after_swapgs+0xbc/0x160
[16178.070768]  [&amp;lt;ffffffff8178387b&amp;gt;] ? system_call_after_swapgs+0xc8/0x160
[16178.071896]  [&amp;lt;ffffffff8178386f&amp;gt;] ? system_call_after_swapgs+0xbc/0x160
[16178.075009]  [&amp;lt;ffffffff81220eab&amp;gt;] SyS_unlinkat+0x1b/0x40
[16178.076048]  [&amp;lt;ffffffff81783929&amp;gt;] system_call_fastpath+0x16/0x1b
[16178.077068]  [&amp;lt;ffffffff8178387b&amp;gt;] ? system_call_after_swapgs+0xc8/0x160
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It fails in multiple different tests too so it&apos;s not any particular testcase&lt;/p&gt;</comment>
                            <comment id="237443" author="gerrit" created="Mon, 26 Nov 2018 14:29:13 +0000"  >&lt;p&gt;Andriy Skulysh (c17819@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33718&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33718&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7558&quot; title=&quot;niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT ...)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7558&quot;&gt;&lt;del&gt;LU-7558&lt;/del&gt;&lt;/a&gt; ptlrpc: connect vs import invalidate race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 994e02633ae373d454c20a9a6e5150d2d5312499&lt;/p&gt;</comment>
                            <comment id="240082" author="gerrit" created="Wed, 16 Jan 2019 07:06:38 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/33718/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33718/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7558&quot; title=&quot;niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT ...)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7558&quot;&gt;&lt;del&gt;LU-7558&lt;/del&gt;&lt;/a&gt; ptlrpc: connect vs import invalidate race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: b1827ff1da829ae5f320a417217757221eedda5f&lt;/p&gt;</comment>
                            <comment id="240124" author="jgmitter" created="Wed, 16 Jan 2019 13:06:15 +0000"  >&lt;p&gt;Landed for 2.13.0&lt;/p&gt;</comment>
                            <comment id="242617" author="gerrit" created="Sat, 23 Feb 2019 17:57:39 +0000"  >&lt;p&gt;Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/34290&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34290&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7558&quot; title=&quot;niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT ...)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7558&quot;&gt;&lt;del&gt;LU-7558&lt;/del&gt;&lt;/a&gt; ptlrpc: connect vs import invalidate race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 54f405709b1bef208d02468601a067a005dbb8e0&lt;/p&gt;</comment>
                            <comment id="242679" author="gerrit" created="Mon, 25 Feb 2019 16:35:34 +0000"  >&lt;p&gt;Minh Diep (mdiep@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/34293&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34293&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7558&quot; title=&quot;niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT ...)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7558&quot;&gt;&lt;del&gt;LU-7558&lt;/del&gt;&lt;/a&gt; ptlrpc: connect vs import invalidate race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 9e4e28e244d0c2579ac5dc5f828be6dd77d708f6&lt;/p&gt;</comment>
                            <comment id="243209" author="gerrit" created="Sat, 2 Mar 2019 01:31:14 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/34290/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34290/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7558&quot; title=&quot;niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT ...)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7558&quot;&gt;&lt;del&gt;LU-7558&lt;/del&gt;&lt;/a&gt; ptlrpc: connect vs import invalidate race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: d5a51f0b718ecf6fca81e15c396e56141b62df6c&lt;/p&gt;</comment>
                            <comment id="245366" author="gerrit" created="Mon, 8 Apr 2019 06:27:30 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/34293/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/34293/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7558&quot; title=&quot;niobuf.c:721:ptl_send_rpc() LASSERT(AT_OFF || imp_state != LUSTRE_IMP_FULL || imp_msghdr_flags &amp;amp; MSGHDR_AT_SUPPORT ...)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7558&quot;&gt;&lt;del&gt;LU-7558&lt;/del&gt;&lt;/a&gt; ptlrpc: connect vs import invalidate race&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 69d1e9805172c3e8da59ad99f470831951253695&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="46607">LU-9628</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="26096">LU-5528</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxvxb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>