<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:08:34 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14302] obdfilter-survey test 2b hangs in case=netdisk</title>
                <link>https://jira.whamcloud.com/browse/LU-14302</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;obdfilter-survey test_2b hangs starting on 02 OCT 2020 with logs at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/258c8761-9d71-4d82-8a5e-1c473e5a089c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/258c8761-9d71-4d82-8a5e-1c473e5a089c&lt;/a&gt;. So far, this looks like it only hangs for ZFS.&lt;/p&gt;

&lt;p&gt;Looking at the suite_log for the failure at &lt;a href=&quot;https://testing.whamcloud.com/test_sessions/296ca0ce-3daa-42aa-a92d-cdbdb7af75da&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sessions/296ca0ce-3daa-42aa-a92d-cdbdb7af75da&lt;/a&gt;, the last test run is for netdisk&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Unable to detect ip address for host: &apos;&apos;
+ NETTYPE=tcp thrlo=4 nobjhi=1 thrhi=4 size=1024 case=netdisk rslt_loc=/tmp targets=&quot;10.9.7.117:lustre-OST0000 10.9.7.117:lustre-OST0001 10.9.7.117:lustre-OST0002 10.9.7.117:lustre-OST0003 10.9.7.117:lustre-OST0004 10.9.7.117:lustre-OST0005 10.9.7.117:lustre-OST0006&quot; /usr/bin/obdfilter-survey
Fri Dec 25 12:00:08 UTC 2020 Obdfilter-survey for case=netdisk from trevis-201vm1.trevis.whamcloud.com
ost  7 sz  7340032K rsz 1024K obj    7 thr   28 write    5.28 [   0.00,    8.15] rewrite    4.88 [   0.00,    4.00] read 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We see some errors in the console logs. For the client1 (vm1) we see an error message from osc_init_grant()&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; [22622.025189] Lustre: DEBUG MARKER: == obdfilter-survey test 2b: Stripe F/S over the Network, async journal ============================== 12:00:01 (1608897601)
[25456.252361] LustreError: 200837:0:(osc_request.c:1043:osc_init_grant()) lustre-OST0003_osc: granted 368640 but already consumed 1048576
[26276.964898] sysrq: SysRq : Changing Loglevel
[26276.965724] sysrq: Loglevel set to 8
[26276.967609] sysrq: SysRq : Show State
[26276.968269]   task                        PC stack   pid father
[26276.969210] systemd         S    0     1      0 0x00000000
[26276.970088] Call Trace:
[26276.970548]  __schedule+0x2a6/0x700
[26276.971139]  schedule+0x38/0xa0
[26276.971677]  schedule_hrtimeout_range_clock+0xf7/0x110
[26276.972514]  ? ep_scan_ready_list.constprop.22+0x20c/0x230
[26276.973406]  ep_poll+0x3c8/0x410
[26276.973953]  ? wake_up_q+0x80/0x80
[26276.974534]  do_epoll_wait+0xb0/0xd0
[26276.975131]  __x64_sys_epoll_wait+0x1a/0x20
[26276.975821]  do_syscall_64+0x5b/0x1a0
[26276.976447]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[26276.977339] RIP: 0033:0x7f023c889257
[26276.977937] Code: Bad RIP value.
[26276.978495] RSP: 002b:00007ffc57bbe450 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[26276.979678] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f023c889257
[26276.980796] RDX: 0000000000000056 RSI: 00007ffc57bbe490 RDI: 0000000000000004
[26276.981913] RBP: 00007ffc57bbe490 R08: 0000000000000000 R09: 7465677261742e79
[26276.983033] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000056
[26276.984152] R13: 00000000ffffffff R14: 00007ffc57bbe490 R15: 0000000000000001
[26276.985270] kthreadd        S    0     2      0 0x80004000
[26276.986143] Call Trace:
[26276.986580]  __schedule+0x2a6/0x700
[26276.987166]  schedule+0x38/0xa0
[26276.987703]  kthreadd+0x2e8/0x300
[26276.988275]  ? kthread_create_on_cpu+0xa0/0xa0
[26276.989001]  ret_from_fork+0x35/0x40
[26276.989601] rcu_gp          I    0     3      2 0x80004000
[26276.990486] Call Trace:
[26276.990916]  __schedule+0x2a6/0x700
[26276.991509]  schedule+0x38/0xa0
[26276.992048]  rescuer_thread+0x29e/0x340
[26276.992686]  ? process_one_work+0x360/0x360
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In the OSS (vm3) console log, we see some grant errors&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[22642.753377] Lustre: DEBUG MARKER: /usr/sbin/lctl set_param obdfilter.lustre-*.sync_journal=0
[22643.253588] LustreError: 366155:0:(tgt_grant.c:919:tgt_grant_alloc()) lustre-OST0000: client lustre-OST0000_osc_UUID/00000000089e081d requesting &amp;gt; max (2147483647), 34359738368
[22643.256053] LustreError: 366155:0:(tgt_grant.c:919:tgt_grant_alloc()) Skipped 6 previous similar messages
[26311.031755] sysrq: SysRq : Changing Loglevel
[26311.032550] sysrq: Loglevel set to 8
[26311.034388] sysrq: SysRq : Show State
[26311.034963]   task                        PC stack   pid father
[26311.035888] systemd         S    0     1      0 0x00000000
[26311.036726] Call Trace:
[26311.037144]  __schedule+0x2a6/0x700
[26311.037701]  schedule+0x38/0xa0
[26311.038200]  schedule_hrtimeout_range_clock+0xf7/0x110
[26311.038999]  ? ep_scan_ready_list.constprop.22+0x20c/0x230
[26311.039846]  ep_poll+0x3c8/0x410
[26311.040381]  ? wake_up_q+0x80/0x80
[26311.040903]  do_epoll_wait+0xb0/0xd0
[26311.041468]  __x64_sys_epoll_wait+0x1a/0x20
[26311.042113]  do_syscall_64+0x5b/0x1a0
[26311.042695]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[26311.043491] RIP: 0033:0x7ffbb0799257
[26311.044053] Code: Bad RIP value.
[26311.044562] RSP: 002b:00007fffda2f4c30 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[26311.045720] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007ffbb0799257
[26311.046806] RDX: 0000000000000057 RSI: 00007fffda2f4c70 RDI: 0000000000000004
[26311.047893] RBP: 00007fffda2f4c70 R08: 0000000000000000 R09: 7465677261742e79
[26311.048978] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000057
[26311.050066] R13: 00000000ffffffff R14: 00007fffda2f4c70 R15: 0000000000000001
[26311.051152] kthreadd        S    0     2      0 0x80004000
[26311.052001] Call Trace:
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We&#8217;ve seen this hang in the past at&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/c037a9f9-459a-41c2-9eea-ea54cb91eb3b&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/c037a9f9-459a-41c2-9eea-ea54cb91eb3b&lt;/a&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="62250">LU-14302</key>
            <summary>obdfilter-survey test 2b hangs in case=netdisk</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="4" iconUrl="https://jira.whamcloud.com/images/icons/statuses/reopened.png" description="This issue was once resolved, but the resolution was deemed incorrect. From here issues are either marked assigned or resolved.">Reopened</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Wed, 6 Jan 2021 21:08:33 +0000</created>
                <updated>Thu, 21 Dec 2023 00:46:36 +0000</updated>
                                            <version>Lustre 2.14.0</version>
                    <version>Lustre 2.12.7</version>
                    <version>Lustre 2.15.0</version>
                    <version>Lustre 2.15.3</version>
                    <version>Lustre 2.15.4</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="289970" author="jamesanunez" created="Wed, 20 Jan 2021 19:55:55 +0000"  >&lt;p&gt;sanity-benchmark test_fsx has a similar issue with the following call trace on each node&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[17061.275242] sysrq: SysRq : Changing Loglevel
[17061.276192] sysrq: Loglevel set to 8
[17061.278000] sysrq: SysRq : Show State
[17061.278766]   task                        PC stack   pid father
[17061.279880] systemd         S    0     1      0 0x00000000
[17061.280938] Call Trace:
[17061.281501]  __schedule+0x2a6/0x700
[17061.282203]  schedule+0x38/0xa0
[17061.282834]  schedule_hrtimeout_range_clock+0xf7/0x110
[17061.283834]  ? ep_scan_ready_list.constprop.22+0x20c/0x230
[17061.284870]  ep_poll+0x3c8/0x410
[17061.285565]  ? wake_up_q+0x80/0x80
[17061.286245]  do_epoll_wait+0xb0/0xd0
[17061.286954]  __x64_sys_epoll_wait+0x1a/0x20
[17061.287781]  do_syscall_64+0x5b/0x1a0
[17061.288537]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[17061.289513] RIP: 0033:0x7f1833964257
[17061.290221] Code: Bad RIP value.
[17061.290865] RSP: 002b:00007fffdf992860 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[17061.292251] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f1833964257
[17061.293825] RDX: 0000000000000055 RSI: 00007fffdf9928a0 RDI: 0000000000000004
[17061.295585] RBP: 00007fffdf9928a0 R08: 0000000000000000 R09: 7465677261742e79
[17061.297085] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000055
[17061.298425] R13: 00000000ffffffff R14: 00007fffdf9928a0 R15: 0000000000000001
[17061.299748] kthreadd        S    0     2      0 0x80004000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.whamcloud.com/test_sets/55bc0e09-caa5-4d08-a4b5-7b53173af78a&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/55bc0e09-caa5-4d08-a4b5-7b53173af78a&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="292022" author="jamesanunez" created="Mon, 15 Feb 2021 23:34:23 +0000"  >&lt;p&gt;Here are a small fraction of test hangs that are attributed to this ticket. I think the true stack trace/issue for each hang below is being masked by the entry_syscall_64 call trace.&lt;br/&gt;
sanity-pcc test 1a - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/1fcf362d-9c64-4b02-9e41-6a213615adb2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/1fcf362d-9c64-4b02-9e41-6a213615adb2&lt;/a&gt;&lt;br/&gt;
sanity test 413b - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/25bd46f6-32a6-4508-ace2-b83c79b50249&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/25bd46f6-32a6-4508-ace2-b83c79b50249&lt;/a&gt;&lt;br/&gt;
sanity-benchmark test fsx - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/fbf24af3-578f-44c0-9992-95f517bc5028&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/fbf24af3-578f-44c0-9992-95f517bc5028&lt;/a&gt;&lt;br/&gt;
sanity-quota test 61 - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/17cb0fe2-25d9-4ab2-8d52-2e98192bf50a&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/17cb0fe2-25d9-4ab2-8d52-2e98192bf50a&lt;/a&gt;&lt;br/&gt;
parallel-scale-nfsv3 test racer_on_nfs - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/163068e8-97c2-4470-b009-856fb6f8b0c5&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/163068e8-97c2-4470-b009-856fb6f8b0c5&lt;/a&gt;&lt;br/&gt;
obdfilter-survey test 2b - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/f892168b-a967-4d6c-8c94-c1475c82420b&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/f892168b-a967-4d6c-8c94-c1475c82420b&lt;/a&gt;&lt;br/&gt;
sanity test 27F - &lt;a href=&quot;https://testing.whamcloud.com/test_sets/65c9dadf-a86b-4919-8af2-f7b6c195c4e1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/65c9dadf-a86b-4919-8af2-f7b6c195c4e1&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="306734" author="jamesanunez" created="Fri, 9 Jul 2021 17:32:27 +0000"  >&lt;p&gt;Another test hang for sanity-sec test 25 for 2.12.7 RC1 at &lt;a href=&quot;https://testing.whamcloud.com/test_sets/3abb9645-7bc7-4630-bc41-557da7b04375&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/3abb9645-7bc7-4630-bc41-557da7b04375&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On client1 adn 2, we see the mount point is &apos;still busy&apos;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[17855.295093] Lustre: DEBUG MARKER: == sanity-sec test 25: test save and reload nodemap config =========================================== 16:59:52 (1625072392)
[17855.928251] Lustre: DEBUG MARKER: running=$(grep -c /mnt/lustre&apos; &apos; /proc/mounts);
[17855.928251] if [ $running -ne 0 ] ; then
[17855.928251] echo Stopping client $(hostname) /mnt/lustre opts:;
[17855.928251] lsof /mnt/lustre || need_kill=no;
[17855.928251] if [ x != x -a x$need_kill != xno ]; then
[17855.928251]     pids=$(lsof -t /mnt/lustre | sort -u);
[17855.928251]     if 
[21488.693672] sysrq: SysRq : Changing Loglevel
[21488.694779] sysrq: Loglevel set to 8
[21488.696709] sysrq: SysRq : Show State
[21488.697557]   task                        PC stack   pid father
[21488.698776] systemd         S    0     1      0 0x00000000
[21488.699937] Call Trace:
[21488.700608]  __schedule+0x292/0x880
[21488.700631]  schedule+0x32/0x80
[21488.702137]  schedule_hrtimeout_range_clock+0x106/0x110
[21488.702169]  ? ep_scan_ready_list.isra.12+0x1af/0x1d0
[21488.704339]  ep_poll+0x2b6/0x360
[21488.704368]  ? wake_up_q+0x70/0x70
[21488.704371]  SyS_epoll_wait+0xb2/0xe0
[21488.706704]  do_syscall_64+0x74/0x160
[21488.706722]  entry_SYSCALL_64_after_hwframe+0x59/0xbe
[21488.708632] RIP: 0033:0x7f5917ae78a3
[21488.708633] RSP: 002b:00007ffebbb08010 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[21488.708634] RAX: ffffffffffffffda RBX: 00005607a2f63de0 RCX: 00007f5917ae78a3
[21488.708635] RDX: 0000000000000031 RSI: 00007ffebbb08020 RDI: 0000000000000004
[21488.708636] RBP: 00007ffebbb08370 R08: 5566e87192e6acdb R09: 00005607a2d099c0
[21488.708636] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000001
[21488.708638] R13: 00007ffebbb08020 R14: ffffffffffffffff R15: 0000000000000000
[21488.718338] kthreadd        S    0     2      0 0x00000000
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="327036" author="sarah" created="Tue, 22 Feb 2022 22:59:16 +0000"  >&lt;p&gt;Hit similar error on master zfs, obdfilter-survey and replay-dual hung with following trace&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/42fe79b4-b112-43e3-b813-ac9ffbe552c7&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/42fe79b4-b112-43e3-b813-ac9ffbe552c7&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/9d7643af-3d54-4dbc-b445-c2bdce57d12d&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/9d7643af-3d54-4dbc-b445-c2bdce57d12d&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[17639.357284] Lustre: DEBUG MARKER: == obdfilter-survey test 2a: Stripe F/S over the Network ========================================================== 03:48:52 (1644378532)
[24905.814376] sysrq: SysRq : Changing Loglevel
[24905.815333] sysrq: Loglevel set to 8
[24905.819043] sysrq: SysRq : Show State
[24905.819814] task:systemd         state:S stack:    0 pid:    1 ppid:     0 flags:0x00000000
[24905.821371] Call Trace:
[24905.821936]  __schedule+0x2c4/0x700
[24905.822647]  schedule+0x37/0xa0
[24905.823246]  schedule_hrtimeout_range_clock+0xf7/0x110
[24905.824250]  ? ep_scan_ready_list.constprop.23+0x20c/0x230
[24905.825291]  ep_poll+0x403/0x460
[24905.825968]  ? wake_up_q+0x80/0x80
[24905.826636]  do_epoll_wait+0xb0/0xd0
[24905.827316]  __x64_sys_epoll_wait+0x1a/0x20
[24905.828125]  do_syscall_64+0x5b/0x1a0
[24905.828857]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[24905.829846] RIP: 0033:0x7f0e2cb5a0f7
[24905.830547] Code: Unable to access opcode bytes at RIP 0x7f0e2cb5a0cd.
[24905.831770] RSP: 002b:00007fffa13c46a0 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
[24905.833189] RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007f0e2cb5a0f7
[24905.834530] RDX: 0000000000000059 RSI: 00007fffa13c46e0 RDI: 0000000000000004
[24905.835860] RBP: 00007fffa13c46e0 R08: 0000000000000000 R09: 00007fffa13c4c70
[24905.837189] R10: 00000000ffffffff R11: 0000000000000293 R12: 0000000000000059
[24905.838516] R13: 00000000ffffffff R14: 00007fffa13c46e0 R15: 0000000000000001
[24905.839849] task:kthreadd        state:S stack:    0 pid:    2 ppid:     0 flags:0x80004000
[24905.841412] Call Trace:
[24905.841893]  __schedule+0x2c4/0x700
[24905.842565]  schedule+0x37/0xa0
[24905.843179]  kthreadd+0x2e8/0x300
[24905.843843]  ? kthread_create_on_cpu+0xa0/0xa0
[24905.844704]  ret_from_fork+0x35/0x40
[24905.845398] task:rcu_gp          state:I stack:    0 pid:    3 ppid:     2 flags:0x80004000
[24905.846963] Call Trace:
[24905.847465]  __schedule+0x2c4/0x700
[24905.848134]  schedule+0x37/0xa0
[24905.848752]  rescuer_thread+0x29e/0x340
[24905.849501]  ? process_one_work+0x360/0x360
[24905.850295]  kthread+0x116/0x130
[24905.850933]  ? kthread_flush_work_fn+0x10/0x10
[24905.851787]  ret_from_fork+0x35/0x40
[24905.852489] task:rcu_par_gp      state:I stack:    0 pid:    4 ppid:     2 flags:0x80004000
[24905.854047] Call Trace:
...
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;another one on ldiskfs DNE config parallel-scale&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/28eb47b3-d0c0-4184-9dbb-561da28a53b9&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/28eb47b3-d0c0-4184-9dbb-561da28a53b9&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="354656" author="adilger" created="Wed, 30 Nov 2022 11:21:51 +0000"  >&lt;p&gt;Has not failed in 6 months.&lt;/p&gt;</comment>
                            <comment id="356377" author="yujian" created="Wed, 14 Dec 2022 08:20:32 +0000"  >&lt;p&gt;sanity test 24v on b2_15 branch:&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/3ec921b2-742b-41af-b0c3-8914b574c31b&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/3ec921b2-742b-41af-b0c3-8914b574c31b&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="357044" author="sarah" created="Tue, 20 Dec 2022 20:14:23 +0000"  >&lt;p&gt;hit this again in 2.15.2-rc1&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://testing.whamcloud.com/test_sets/5ed91b10-10ab-4dd9-b960-6a72d84f11df&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/5ed91b10-10ab-4dd9-b960-6a72d84f11df&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;client&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
Sun Dec 11 23:16:42 2022] Lustre: 19950:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1670800585/real 1670800585] req@00000000532f8e4e x1751941312768064/t0(0) o4-&amp;gt;lustre-OST0004_osc@10.240.39.106@tcp:6/4 lens 488/448 e 0 to 1 dl 1670800604 ref 2 fl Rpc:XQr/0/ffffffff rc 0/-1 job:&apos;ptlrpcd_01_00.0&apos; [Sun Dec 11 23:16:42 2022] Lustre: lustre-OST0004_osc: Connection to lustre-OST0004 (at 10.240.39.106@tcp) was lost; in progress operations using this service will wait for recovery to complete [Sun Dec 11 23:16:42 2022] Lustre: lustre-OST0004_osc: Connection restored to host_6_UUID (at 10.240.39.106@tcp) [Sun Dec 11 23:24:14 2022] LustreError: 19947:0:(osc_request.c:1037:osc_init_grant()) lustre-OST0005_osc: granted 372736 but already consumed 1048576 [Sun Dec 11 23:24:27 2022] Lustre: 19950:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1670801059/real 1670801059] req@00000000c6ac8d21 x1751941313020928/t0(0) o4-&amp;gt;lustre-OST0003_osc@10.240.39.106@tcp:6/4 lens 488/448 e 0 to 1 dl 1670801069 ref 2 fl Rpc:XQr/0/ffffffff rc 0/-1 job:&apos;ptlrpcd_01_00.0&apos; [Sun Dec 11 23:24:27 2022] Lustre: 19948:0:(client.c:2295:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1670801059/real 1670801059] req@000000009b3144c2 x1751941313020800/t0(0) o4-&amp;gt;lustre-OST0003_osc@10.240.39.106@tcp:6/4 lens 488/448 e 0 to 1 dl 1670801069 ref 2 fl Rpc:XQr/0/ffffffff rc 0/-1 job:&apos;ptlrpcd_00_00.0&apos; [Sun Dec 11 23:24:27 2022] Lustre: 19950:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [Sun Dec 11 23:24:27 2022] Lustre: 19948:0:(client.c:2295:ptlrpc_expire_one_request()) Skipped 2 previous similar messages [Sun Dec 11 23:24:27 2022] Lustre: lustre-OST0003_osc: Connection to lustre-OST0003 (at 10.240.39.106@tcp) was lost; in progress operations using this service will wait for recovery to complete [Sun Dec 11 23:24:27 2022] Lustre: Skipped 1 previous similar message [Sun Dec 11 23:24:27 2022] Lustre: lustre-OST0003_osc: Connection restored to host_6_UUID (at 10.240.39.106@tcp) [Sun Dec 11 23:24:27 2022] Lustre: Skipped 1 previous similar message [Sun Dec 11 23:55:08 2022] LustreError: 19947:0:(osc_request.c:1037:osc_init_grant()) lustre-OST0003_osc: granted 372736 but already consumed 1048576

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;OSS&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;
[Sun Dec 11 23:07:10 2022] Lustre: lustre-OST0000: Client lustre-OST0000_osc_UUID (at 10.240.39.104@tcp) reconnecting [Sun Dec 11 23:08:25 2022] Lustre: lustre-OST0005: Client lustre-OST0005_osc_UUID (at 10.240.39.104@tcp) reconnecting [Sun Dec 11 23:08:30 2022] LustreError: 353209:0:(ldlm_lib.c:3490:target_bulk_io()) @@@ bulk WRITE failed: rc = -107 req@00000000b67af265 x1751941312481472/t0(0) o4-&amp;gt;lustre-OST0005_osc_UUID@10.240.39.104@tcp:0/0 lens 488/448 e 0 to 0 dl 1670800121 ref 1 fl Interpret:/0/0 rc 0/0 job:&apos;ptlrpcd_00_01.0&apos; [Sun Dec 11 23:08:30 2022] LustreError: 353209:0:(ldlm_lib.c:3490:target_bulk_io()) Skipped 6 previous similar messages [Sun Dec 11 23:08:30 2022] Lustre: lustre-OST0005: Bulk IO write error with lustre-OST0005_osc_UUID (at 10.240.39.104@tcp), client will retry: rc = -107 [Sun Dec 11 23:08:30 2022] Lustre: Skipped 1 previous similar message [Sun Dec 11 23:09:46 2022] Lustre: 353215:0:(service.c:2329:ptlrpc_server_handle_request()) @@@ Request took longer than estimated (20/1s); client may timeout req@0000000005918235 x1751941312522240/t8589941733(0) o4-&amp;gt;lustre-OST0004_osc_UUID@10.240.39.104@tcp:0/0 lens 488/448 e 0 to 0 dl 1670800187 ref 1 fl Complete:/0/0 rc 0/0 job:&apos;ptlrpcd_00_00.0&apos; [Sun Dec 11 23:14:23 2022] Lustre: lustre-OST0004: Client lustre-OST0004_osc_UUID (at 10.240.39.104@tcp) reconnecting [Sun Dec 11 23:14:25 2022] LustreError: 353232:0:(ldlm_lib.c:3490:target_bulk_io()) @@@ bulk WRITE failed: rc = -107 req@00000000e6dde848 x1751941312695872/t0(0) o4-&amp;gt;lustre-OST0003_osc_UUID@10.240.39.104@tcp:0/0 lens 488/448 e 0 to 0 dl 1670800483 ref 1 fl Interpret:/0/0 rc 0/0 job:&apos;ptlrpcd_01_01.0&apos; [Sun Dec 11 23:14:25 2022] Lustre: lustre-OST0003: Bulk IO write error with lustre-OST0003_osc_UUID (at 10.240.39.104@tcp), client will retry: rc = -107 [Sun Dec 11 23:14:28 2022] LustreError: 353216:0:(ldlm_lib.c:3490:target_bulk_io()) @@@ bulk WRITE failed: rc = -107 req@0000000004060a7c x1751941312694464/t0(0) o4-&amp;gt;lustre-OST0004_osc_UUID@10.240.39.104@tcp:0/0 lens 488/448 e 0 to 0 dl 1670800480 ref 1 fl Interpret:/0/0 rc 0/0 job:&apos;ptlrpcd_00_00.0&apos; [Sun Dec 11 23:14:28 2022] Lustre: lustre-OST0004: Bulk IO write error with lustre-OST0004_osc_UUID (at 10.240.39.104@tcp), client will retry: rc = -107 [Sun Dec 11 23:14:52 2022] Lustre: lustre-OST0006: Client lustre-OST0006_osc_UUID (at 10.240.39.104@tcp) reconnecting

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="69789">LU-15756</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                                        </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01irb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>