<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:33:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-17149] TBF: req_capsule_extend() ASSERTION( fmt-&gt;rf_fields[i].nr &gt;= old-&gt;rf_fields[i].nr )</title>
                <link>https://jira.whamcloud.com/browse/LU-17149</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We hit the follwing crash on a 2.15.3 Lustre version with TBF NRS policy activated on &quot;mdt&quot; service:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[892127.117400] LustreError: 8949:0:(layout.c:2467:req_capsule_extend()) ASSERTION( fmt-&amp;gt;rf_fields[i].nr &amp;gt;= old-&amp;gt;rf_fields[i].nr ) failed:
[892127.118895] LustreError: 8949:0:(layout.c:2467:req_capsule_extend()) LBUG
[892127.119727] Pid: 8949, comm: mdt03_008 4.18.0-477.13.1.el8_8.x86_64 #1 SMP Tue May 30 14:53:41 EDT 2023
[892127.120846] Call Trace TBD:
[892127.121216] [&amp;lt;0&amp;gt;] libcfs_call_trace+0x6f/0xa0 [libcfs]
[892127.121874] [&amp;lt;0&amp;gt;] lbug_with_loc+0x3f/0x70 [libcfs]
[892127.122485] [&amp;lt;0&amp;gt;] req_capsule_extend+0x174/0x1b0 [ptlrpc]
[892127.123422] [&amp;lt;0&amp;gt;] nrs_tbf_id_cli_set+0x1ee/0x2a0 [ptlrpc]
[892127.124165] [&amp;lt;0&amp;gt;] nrs_tbf_generic_cli_init+0x50/0x180 [ptlrpc]
[892127.124986] [&amp;lt;0&amp;gt;] nrs_tbf_res_get+0x1fe/0x430 [ptlrpc]
[892127.125670] [&amp;lt;0&amp;gt;] nrs_resource_get+0x6c/0xe0 [ptlrpc]
[892127.126382] [&amp;lt;0&amp;gt;] nrs_resource_get_safe+0x87/0xe0 [ptlrpc]
[892127.127126] [&amp;lt;0&amp;gt;] ptlrpc_nrs_req_initialize+0x58/0xb0 [ptlrpc]
[892127.127919] [&amp;lt;0&amp;gt;] ptlrpc_server_request_add+0x248/0xa20 [ptlrpc]
[892127.128771] [&amp;lt;0&amp;gt;] ptlrpc_server_handle_req_in+0x36a/0x8c0 [ptlrpc]
[892127.129607] [&amp;lt;0&amp;gt;] ptlrpc_main+0xb97/0x1530 [ptlrpc]
[892127.130284] [&amp;lt;0&amp;gt;] kthread+0x134/0x150
[892127.130826] [&amp;lt;0&amp;gt;] ret_from_fork+0x1f/0x40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;ldlm_tbf_id_cli_set() try to extend a request already extend:&lt;br/&gt;
We have pill-&amp;gt;rc_fmt == RQF_LDLM_INTENT_GETATTR&lt;br/&gt;
And we try to do: req_capsule_extend(&amp;amp;req-&amp;gt;rq_pill, &amp;amp;RQF_LDLM_INTENT_BASIC);&lt;/p&gt;

&lt;p&gt;RQF_LDLM_INTENT_GETATTR has 7 fields:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-c&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;struct&lt;/span&gt; req_msg_field *ldlm_intent_getattr_client[] = {           
        &amp;amp;RMF_PTLRPC_BODY,                                                     
        &amp;amp;RMF_DLM_REQ,                                                         
        &amp;amp;RMF_LDLM_INTENT,                                                     
        &amp;amp;RMF_MDT_BODY,     &lt;span class=&quot;code-comment&quot;&gt;/* coincides with mds_getattr_name_client[] */&lt;/span&gt;     
        &amp;amp;RMF_CAPA1,                                                           
        &amp;amp;RMF_NAME,                                                            
        &amp;amp;RMF_FILE_SECCTX_NAME                                                 
};                                                              
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;RQF_LDLM_INTENT_BASIC has only 3 fields:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-c&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;struct&lt;/span&gt; req_msg_field *ldlm_intent_basic_client[] = { 
        &amp;amp;RMF_PTLRPC_BODY,                                         
        &amp;amp;RMF_DLM_REQ,                                             
        &amp;amp;RMF_LDLM_INTENT,                                         
};                                                                
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This was made possible since the patch: &lt;a href=&quot;https://review.whamcloud.com/45272&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45272&lt;/a&gt; (&quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15118&quot; title=&quot;There isn&amp;#39;t any free thread to process resend request&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15118&quot;&gt;&lt;del&gt;LU-15118&lt;/del&gt;&lt;/a&gt; ldlm: no free thread to process resend request&quot;)&lt;/p&gt;

&lt;p&gt;We call ldlm_enqueue_hpreq_check() before nrs_resource_get_safe() that initialize the pill with RMF_DLM_REQ for LDLM_ENQUEUE with MSG_RESENT flag:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-c&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt;&lt;/span&gt; ldlm_enqueue_hpreq_check(&lt;span class=&quot;code-keyword&quot;&gt;struct&lt;/span&gt; ptlrpc_request *req)                          
{                                                                                        
....                                                                                                                                                  
        if ((lustre_msg_get_flags(req-&amp;gt;rq_reqmsg) &amp;amp; (MSG_REPLAY|MSG_RESENT)) !=    
            MSG_RESENT)                                                            
                RETURN(0);                                                         
                                                                                   
        req_capsule_init(&amp;amp;req-&amp;gt;rq_pill, req, RCL_SERVER);                          
        req_capsule_set(&amp;amp;req-&amp;gt;rq_pill, &amp;amp;RQF_LDLM_ENQUEUE);                         
                                                        
....
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Then nrs_tbf_id_cli_set() is called 2 times in nrs_tbf_res_get():&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;o_cli_find(): nrs_tbf_id_cli_find()&lt;/li&gt;
	&lt;li&gt;o_cli_init(): nrs_tbf_id_cli_init()&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;After nrs_tbf_id_cli_find(): rc_fmt == RQF_LDLM_INTENT_GETATTR&lt;br/&gt;
So nrs_tbf_id_cli_init() -&amp;gt; nrs_tbf_id_cli_set() -&amp;gt; ldlm_tbf_id_cli_set() -&amp;gt; req_capsule_extend() will crash.&lt;/p&gt;

&lt;p&gt;This crash does not occur if rc_fmt was initially NULL because nrs_tbf_id_cli_set() restores the NULL pointer before returning:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-c&quot;&gt;
 &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt;&lt;/span&gt; nrs_tbf_id_cli_set(&lt;span class=&quot;code-keyword&quot;&gt;struct&lt;/span&gt; ptlrpc_request *req ...
....
        req_capsule_init(&amp;amp;req-&amp;gt;rq_pill, req, RCL_SERVER); 
        if (req-&amp;gt;rq_pill.rc_fmt == &lt;span class=&quot;code-keyword&quot;&gt;NULL&lt;/span&gt;) {                
                req_capsule_set(&amp;amp;req-&amp;gt;rq_pill, fmt);      
                fmt_unset = &lt;span class=&quot;code-keyword&quot;&gt;true&lt;/span&gt;;                         
        }                                                 
....
       &lt;span class=&quot;code-comment&quot;&gt;/* restore it to the initialized state */&lt;/span&gt;        
       if (fmt_unset)                                   
               req-&amp;gt;rq_pill.rc_fmt = &lt;span class=&quot;code-keyword&quot;&gt;NULL&lt;/span&gt;;              
       &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; rc;                                       
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Reproducer&lt;/b&gt;&lt;br/&gt;
I was not able to reproduce the issue in a test environment. But this appears when the server was heavily loaded. This occurs only for resent requests, not replays.&lt;/p&gt;

&lt;p&gt;The impacted versions are:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;&lt;b&gt;2.15.3&lt;/b&gt;&lt;/li&gt;
	&lt;li&gt;&lt;b&gt;master&lt;/b&gt; with client in 2.15.3 or 2.12 (without the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16077&quot; title=&quot;Cannot use tbf to filter brw request per effective uid/gid, inode attr ids is used instead&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16077&quot;&gt;&lt;del&gt;LU-16077&lt;/del&gt;&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
</description>
                <environment>&amp;quot;tbf&amp;quot; activated on &amp;quot;mdt&amp;quot; service</environment>
        <key id="78143">LU-17149</key>
            <summary>TBF: req_capsule_extend() ASSERTION( fmt-&gt;rf_fields[i].nr &gt;= old-&gt;rf_fields[i].nr )</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="eaujames">Etienne Aujames</assignee>
                                    <reporter username="eaujames">Etienne Aujames</reporter>
                        <labels>
                            <label>tbf</label>
                    </labels>
                <created>Wed, 27 Sep 2023 11:09:01 +0000</created>
                <updated>Fri, 3 Nov 2023 13:00:13 +0000</updated>
                            <resolved>Fri, 3 Nov 2023 13:00:13 +0000</resolved>
                                    <version>Lustre 2.15.3</version>
                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="387380" author="gerrit" created="Wed, 27 Sep 2023 12:11:27 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/52528&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/52528&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17149&quot; title=&quot;TBF: req_capsule_extend() ASSERTION( fmt-&amp;gt;rf_fields[i].nr &amp;gt;= old-&amp;gt;rf_fields[i].nr )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17149&quot;&gt;&lt;del&gt;LU-17149&lt;/del&gt;&lt;/a&gt; tbf: nrs_tbf_id_cli_set should not modify the fmt&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 8f8512d8a3593d463a8216f7882d436305fb4bf3&lt;/p&gt;</comment>
                            <comment id="387732" author="delbaryg" created="Fri, 29 Sep 2023 11:39:42 +0000"  >&lt;p&gt;Crash analysis:&lt;br/&gt;
Interesting things on backtrace:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; bt -FF | grep -C2 RQF
    ffffa7f76706fa68: __crash_kexec+0x6a
 #1 [ffffa7f76706fa68] __crash_kexec at ffffffff939b564a
    ffffa7f76706fa70: 0000000000000000 RQF_LDLM_ENQUEUE
    ffffa7f76706fa80: [ffff8efcd2a43550:ptlrpc_cache] __func__.59802+0xa52
    ffffa7f76706fa90: ffffa7f76706fba0 [ffff8ef468350000:task_struct]
--
    ffffa7f76706fb70: 0000000000000000 c0000000ffff7fff
    ffffa7f76706fb80: [ffff8ef468350000:task_struct] 0000000000000065
--&amp;gt;&amp;gt;    ffffa7f76706fb90: [ffff8efcd2a43550:ptlrpc_cache] RQF_LDLM_ENQUEUE
    ffffa7f76706fba0: [ffff8efcd2a43180:ptlrpc_cache] libcfs_debug_dumplog_thread.cold.9
    ffffa7f76706fbb0: ffffa7f76706fc10 req_capsule_extend+0x174
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As Etienne said RQF_LDLM_ENQUEUE is set.&lt;/p&gt;

&lt;p&gt;Check rc_fmt:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; kmem ffff8efcd2a43550
CACHE             OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE  NAME
ffff8ef5223de840     1120       4777      6888    246    32k  ptlrpc_cache
  SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
  ffffd067a64a9000  ffff8efcd2a40000     0     28         28     0
  FREE / [ALLOCATED]
  [ffff8efcd2a43180]

      PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
ffffd067a64a90c0 992a43000 dead000000000400        0  0 17ffffc0000000

crash&amp;gt; struct ptlrpc_request ffff8efcd2a43180 | grep -A2 rq_peer
  rq_peer = {
    nid = 0xourclientnid,
    pid = 0x3039

crash&amp;gt; p ((struct ptlrpc_request*)0xffff8efcd2a43180)-&amp;gt;rq_pill
$11 = {
  rc_req = 0xffff8efcd2a43180,
--&amp;gt;&amp;gt;  rc_reqmsg = 0xffff8ef58015b8c8,
  rc_repmsg = 0x0,
  rc_req_swab_mask = 0x0,
  rc_rep_swab_mask = 0x0,
--&amp;gt;&amp;gt;  rc_fmt = 0xffffffffc12ab8e0 &amp;lt;RQF_LDLM_INTENT_GETATTR&amp;gt;,
  rc_loc = RCL_SERVER,
  rc_area = {{0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff}, {0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff}}
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Oups rc_fmt is set to RQF_LDLM_INTENT_GETATTR.&lt;/p&gt;

&lt;p&gt;Get back to ptlrpc_body_v3 to check MGS_SENT flag too (thanks Etienne for crash assistance ):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; x/500s 0xffff8ef58015b8c8
...
0xffff8ef58015b998:     &quot;robinhood.0&quot;
...
crash&amp;gt; struct ptlrpc_body_v3 -l ptlrpc_body_v3.pb_jobid 0xffff8ef58015b998 | grep flags
--&amp;gt;&amp;gt;  pb_flags = 0x2,
  pb_op_flags = 0x0,
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-c&quot;&gt;
&lt;span class=&quot;code-macro&quot;&gt;#define MSG_RESENT               0x0002 &lt;span class=&quot;code-comment&quot;&gt;/* was previously sent, no reply seen */&lt;/span&gt;&lt;/span&gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;As Etienne said, this behavior is due to patch: &lt;a href=&quot;https://review.whamcloud.com/45272&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45272&lt;/a&gt; (&quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15118&quot; title=&quot;There isn&amp;#39;t any free thread to process resend request&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15118&quot;&gt;&lt;del&gt;LU-15118&lt;/del&gt;&lt;/a&gt; ldlm: no free thread to process resend request&quot;)&lt;/p&gt;</comment>
                            <comment id="387752" author="eaujames" created="Fri, 29 Sep 2023 13:15:37 +0000"  >&lt;p&gt;I have been able to reproduce the issue with 2.15.3 LTS (without any patch) in a test environment (VMs):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 23971:0:(layout.c:1898:req_capsule_set()) ASSERTION( pill-&amp;gt;rc_fmt == ((void *)0) || pill-&amp;gt;rc_fmt == fmt ) failed: 
LustreError: 23971:0:(layout.c:1898:req_capsule_set()) LBUG                                                                    
Pid: 23971, comm: mdt00_001 3.10.0-1160.59.1.el7.centos.plus.x86_64 #1 SMP Wed Feb 23 17:40:21 UTC 2022                        
Call Trace:
 [&amp;lt;ffffffffbab975b9&amp;gt;] dump_stack+0x19/0x1b
 [&amp;lt;ffffffffbab912c1&amp;gt;] panic+0xe8/0x21f
 [&amp;lt;ffffffffc0d4c4db&amp;gt;] lbug_with_loc+0x9b/0xa0 [libcfs]
 [&amp;lt;ffffffffc1423abd&amp;gt;] req_capsule_set+0x9d/0xa0 [ptlrpc]
 [&amp;lt;ffffffffc14610ba&amp;gt;] tgt_request_preprocess.isra.26+0xca/0x850 [ptlrpc]
 [&amp;lt;ffffffffc146243e&amp;gt;] tgt_request_handle+0x90e/0x19c0 [ptlrpc]
 [&amp;lt;ffffffffc0e307a6&amp;gt;] ? libcfs_nid2str_r+0x106/0x130 [lnet]
 [&amp;lt;ffffffffc140c853&amp;gt;] ptlrpc_server_handle_request+0x253/0xc30 [ptlrpc]
 [&amp;lt;ffffffffc140e4c4&amp;gt;] ptlrpc_main+0xbf4/0x15e0 [ptlrpc]
 [&amp;lt;ffffffffba4d4abe&amp;gt;] ? finish_task_switch+0x4e/0x1c0
 [&amp;lt;ffffffffc140d8d0&amp;gt;] ? ptlrpc_wait_event+0x5c0/0x5c0 [ptlrpc]
 [&amp;lt;ffffffffba4c5e61&amp;gt;] kthread+0xd1/0xe0
 [&amp;lt;ffffffffba4c5d90&amp;gt;] ? insert_kthread_work+0x40/0x40
 [&amp;lt;ffffffffbabaadf7&amp;gt;] ret_from_fork_nospec_begin+0x21/0x21
 [&amp;lt;ffffffffba4c5d90&amp;gt;] ? insert_kthread_work+0x40/0x40
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;This time the entry exists in the TBF hashmap (nrs_tbf_id_cli_set() is executed 1 times) so it crashes at the next req_capsule_set().&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; ptlrpc_request.rq_pill ffff914bf953ec00
  rq_pill = {
    rc_req = 0xffff914bf953ec00, 
    rc_reqmsg = 0xffff914bf15ad3d0, 
    rc_repmsg = 0x0, 
    rc_req_swab_mask = 0, 
    rc_rep_swab_mask = 0, 
    rc_fmt = 0xffffffffc15087a0 &amp;lt;RQF_LDLM_INTENT_OPEN&amp;gt;, 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;rc_fmt should be set with RQF_LDLM_ENQUEUE or NULL.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;crash&amp;gt; ptlrpc_body_v3 0xffff914bf15ad420
struct ptlrpc_body_v3 {
  pb_handle = {
    cookie = 15792526650769917007
  }, 
  pb_type = 4711,                                     ------&amp;gt; PTL_RPC_MSG_REQUEST
  pb_version = 262147, 
  pb_opc = 101,                                        ------&amp;gt; LDLM_ENQUEUE
  pb_status = 24424, 
  pb_last_xid = 1778373737838399, 
  pb_tag = 4, 
  pb_padding0 = 0, 
  pb_padding1 = 0, 
  pb_last_committed = 0, 
  pb_transno = 0, 
  pb_flags = 2,                                          ------&amp;gt; MSG_RESENT
  pb_op_flags = 0, 
  pb_conn_cnt = 19, 
  pb_timeout = 6, 
  pb_service_time = 0, 
  pb_limit = 0, 
  pb_slv = 0, 
  pb_pre_versions = {0, 0, 0, 0}, 
  pb_mbits = 1778373739752960, 
  pb_padding64_0 = 0, 
  pb_padding64_1 = 0, 
  pb_padding64_2 = 0, 
  pb_jobid = &quot;\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000&quot;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&quot;pb_flags = 2&quot; means that req_capsule_set() has been called inside ldlm_enqueue_hpreq_check()&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Reproducer&lt;/b&gt;&lt;br/&gt;
&quot;tbf uid&quot; enable on mdt service.&lt;br/&gt;
2 clients mounted on the VM:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client client]# while true; do printf &quot;%s\n&quot; toto{1..100}/toto{1..100} | xargs -P100 -I{} flock -x {}  touch {}; done
[root@client client2]# while true; do printf &quot;%s\n&quot; toto{1..100}/toto{1..100} | xargs -P100 -I{} flock -x {}  touch {}; done
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then force the client to recover:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@client ~]# lctl dl                                                                                    
  0 UP mgc MGC10.0.2.4@tcp f5da7520-e1a5-4ff4-a419-9408020da144 4                                           
  1 UP lov lustrefs-clilov-ffff9183fae23800 8898dba4-553b-4c57-80f4-4751185cf9a8 3                          
  2 UP lmv lustrefs-clilmv-ffff9183fae23800 8898dba4-553b-4c57-80f4-4751185cf9a8 4                          
  3 UP mdc lustrefs-MDT0000-mdc-ffff9183fae23800 8898dba4-553b-4c57-80f4-4751185cf9a8 4                     
....
[root@client ~]# lctl --device 3 recover
[root@client ~]# lctl --device 3 recover
[root@client ~]# lctl --device 3 recover
[root@client ~]# lctl --device 3 recover
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="391595" author="gerrit" created="Fri, 3 Nov 2023 04:05:42 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/52528/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/52528/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17149&quot; title=&quot;TBF: req_capsule_extend() ASSERTION( fmt-&amp;gt;rf_fields[i].nr &amp;gt;= old-&amp;gt;rf_fields[i].nr )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17149&quot;&gt;&lt;del&gt;LU-17149&lt;/del&gt;&lt;/a&gt; tbf: nrs_tbf_id_cli_set should not modify the fmt&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 855f3d03c21752c8d7136a8a9e48223ee3302512&lt;/p&gt;</comment>
                            <comment id="391634" author="gerrit" created="Fri, 3 Nov 2023 12:45:18 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/52974&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/52974&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-17149&quot; title=&quot;TBF: req_capsule_extend() ASSERTION( fmt-&amp;gt;rf_fields[i].nr &amp;gt;= old-&amp;gt;rf_fields[i].nr )&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-17149&quot;&gt;&lt;del&gt;LU-17149&lt;/del&gt;&lt;/a&gt; tbf: nrs_tbf_id_cli_set should not modify the fmt&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 87c09c807f72c62af6ed97f2b1ea83469d1bebbf&lt;/p&gt;</comment>
                            <comment id="391638" author="pjones" created="Fri, 3 Nov 2023 13:00:13 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="71649">LU-16077</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="66698">LU-15118</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>QoS-TBF</label>
            <label>mds</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03wvz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>