<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:07:26 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-14169] parallel-scale-nfsv3 test compilebench crashes with OOM</title>
                <link>https://jira.whamcloud.com/browse/LU-14169</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We&#8217;ve seen parallel-scale-nfsv3 test_compilebench crashes with OOM only once. In this case &lt;a href=&quot;https://testing.whamcloud.com/test_sets/a96851ea-9fc2-4442-b901-b19d0d5c0fc1&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/a96851ea-9fc2-4442-b901-b19d0d5c0fc1&lt;/a&gt;, we are testing DNE with ZFS for RHEL 7.9 client/server. The last thing seen in the suite_log is&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== parallel-scale-nfsv3 test compilebench: compilebench ============================================== 00:10:01 (1606003801)
OPTIONS:
cbench_DIR=/usr/bin
cbench_IDIRS=2
cbench_RUNS=2
trevis-46vm1.trevis.whamcloud.com
trevis-46vm2
free space = 15279104 KB
./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720 -i 2         -r 2 --makej
using working directory /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720, 2 intial dirs 2 runs
native unpatched native-0 222MB in 196.06 seconds (1.13 MB/s)
native patched native-0 109MB in 56.43 seconds (1.94 MB/s)
native patched compiled native-0 691MB in 145.28 seconds (4.76 MB/s)
create dir kernel-0 222MB in 184.62 seconds (1.20 MB/s)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the MDS2/4 (vm5) console log, we see communication errors&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[126471.115681] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.29720 -i 2         -r 2 --makej
[126471.388277] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720 -i 2 -r 2 --makej
[132040.380810] Lustre: 22680:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606009364/real 1606009364]  req@ffff8fdc15e2d680 x1683944906779264/t0(0) o13-&amp;gt;lustre-OST0002-osc-MDT0001@10.9.5.186@tcp:7/4 lens 224/368 e 0 to 1 dl 1606009371 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
[132040.385562] Lustre: lustre-OST0002-osc-MDT0001: Connection to lustre-OST0002 (at 10.9.5.186@tcp) was lost; in progress operations using this service will wait for recovery to complete
[132040.388239] Lustre: Skipped 4 previous similar messages
[132078.146048] Lustre: lustre-MDT0003: haven&apos;t heard from client lustre-MDT0003-lwp-OST0000_UUID (at 10.9.5.186@tcp) in 51 seconds. I think it&apos;s dead, and I am evicting it. exp ffff8fdc3f68f000, cur 1606009409 expire 1606009379 last 1606009358
[132078.149715] Lustre: Skipped 1 previous similar message

&amp;lt;ConMan&amp;gt; Console [trevis-46vm5] disconnected from &amp;lt;trevis-46:6004&amp;gt; at 11-22 01:44.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On the MDS1/3 (vm4) console log, we see similar issues, but many more errors like&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[126470.851038] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test compilebench: compilebench ============================================== 00:10:01 (1606003801)
[126471.153265] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.29720 -i 2         -r 2 --makej
[126471.438784] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720 -i 2 -r 2 --makej
[126514.413660] LustreError: 2619:0:(qmt_handler.c:514:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:262147 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:16204339 soft:15432704 granted:0 time:0 qunit: 262144 edquot:0 may_rel:0 revoke:0 default:no
[126514.418116] LustreError: 2619:0:(qmt_handler.c:514:qmt_dqacq0()) Skipped 8 previous similar messages
[127114.385591] LustreError: 2619:0:(qmt_handler.c:514:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:262147 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:16204339 soft:15432704 granted:0 time:0 qunit: 262144 edquot:0 may_rel:0 revoke:0 default:no
[127114.390114] LustreError: 2619:0:(qmt_handler.c:514:qmt_dqacq0()) Skipped 9 previous similar messages
[127774.353234] LustreError: 6643:0:(qmt_handler.c:514:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:262147 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:16204339 soft:15432704 granted:0 time:0 qunit: 262144 edquot:0 may_rel:0 revoke:0 default:no
[127774.357978] LustreError: 6643:0:(qmt_handler.c:514:qmt_dqacq0()) Skipped 10 previous similar messages
[127931.015494] Lustre: 19775:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606004660/real 1606004660]  req@ffff9369daabba80 x1683953590123520/t0(0) o101-&amp;gt;lustre-OST0004-osc-ffff9369fa533800@10.9.5.186@tcp:28/4 lens 328/400 e 24 to 1 dl 1606005261 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
[127931.020478] Lustre: lustre-OST0004-osc-ffff9369fa533800: Connection to lustre-OST0004 (at 10.9.5.186@tcp) was lost; in progress operations using this service will wait for recovery to complete
[127931.023258] Lustre: Skipped 2 previous similar messages
[127931.029602] Lustre: lustre-OST0002-osc-ffff9369fa533800: Connection restored to 10.9.5.186@tcp (at 10.9.5.186@tcp)
[127931.031642] Lustre: Skipped 36 previous similar messages
[128263.190427] Lustre: 19776:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606005261/real 1606005261]  req@ffff936a18bf4900 x1683953590431360/t0(0) o103-&amp;gt;lustre-OST0002-osc-ffff9369fa533800@10.9.5.186@tcp:17/18 lens 888/224 e 1 to 1 dl 1606005593 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
[128263.195381] Lustre: 19776:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 300 previous similar messages
[128263.197072] Lustre: lustre-OST0002-osc-ffff9369fa533800: Connection to lustre-OST0002 (at 10.9.5.186@tcp) was lost; in progress operations using this service will wait for recovery to complete
[128263.199874] Lustre: Skipped 7 previous similar messages
&#8230;
[128733.982640] LustreError: 18293:0:(osc_cache.c:959:osc_extent_wait()) extent ffff9369fe27d000@{[0 -&amp;gt; 255/255], [3|0|+|rpc|wihuY|ffff936a01033dc0], [1703936|13|+|-|ffff9369c4e1cb40|256|ffff936a1a76d280]} lustre-OST0003-osc-ffff9369fa533800: wait ext to 0 timedout, recovery in progress?
[128733.987044] LustreError: 18293:0:(osc_cache.c:959:osc_extent_wait()) ### extent: ffff9369fe27d000 ns: lustre-OST0003-osc-ffff9369fa533800 lock: ffff9369c4e1cb40/0x457300162b236ab9 lrc: 3/0,0 mode: PW/PW res: [0x4c9f7:0x0:0x0].0x0 rrc: 2 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;53247) flags: 0x29400000000 nid: local remote: 0x345f4682719796fe expref: -99 pid: 14029 timeout: 0 lvb_type: 1
[128774.136661] LustreError: 2606:0:(osc_cache.c:959:osc_extent_wait()) extent ffff9369cee80b00@{[0 -&amp;gt; 255/255], [3|0|+|rpc|wihuY|ffff936a1535f8c0], [1703936|1|+|-|ffff936a34d27680|256|ffff936a38fb1080]} lustre-OST0002-osc-ffff9369fa533800: wait ext to 0 timedout, recovery in progress?
[128774.140963] LustreError: 2606:0:(osc_cache.c:959:osc_extent_wait()) ### extent: ffff9369cee80b00 ns: lustre-OST0002-osc-ffff9369fa533800 lock: ffff936a34d27680/0x457300162b2371dc lrc: 3/0,0 mode: PW/PW res: [0x4cc1e:0x0:0x0].0x0 rrc: 2 type: EXT [0-&amp;gt;18446744073709551615] (req 0-&amp;gt;4095) flags: 0x29400000000 nid: local remote: 0x345f468271979886 expref: -99 pid: 14029 timeout: 0 lvb_type: 1
&#8230;
[131959.075762] Lustre: 19776:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606008982/real 1606008982]  req@ffff936a19bc0000 x1683953590680128/t0(0) o103-&amp;gt;lustre-OST0000-osc-ffff9369fa533800@10.9.5.186@tcp:17/18 lens 496/224 e 12 to 1 dl 1606009289 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
[131959.080744] Lustre: 19776:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
[131959.091626] Lustre: lustre-OST0000-osc-ffff9369fa533800: Connection restored to 10.9.5.186@tcp (at 10.9.5.186@tcp)
[131959.093410] Lustre: Skipped 15 previous similar messages
[132076.505428] Lustre: MGS: haven&apos;t heard from client 2e6a78ed-898c-e05c-81c2-d8a60a1bee04 (at 10.9.5.186@tcp) in 49 seconds. I think it&apos;s dead, and I am evicting it. exp ffff936a0d00ac00, cur 1606009407 expire 1606009377 last 1606009358
[132076.511277] Lustre: Skipped 1 previous similar message

&amp;lt;ConMan&amp;gt; Console [trevis-46vm4] disconnected from &amp;lt;trevis-46:6003&amp;gt; at 11-22 01:44.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;ON the OSS (vm3) console log, we see the oom, hung tasks and stack traces&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[126470.900307] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test compilebench: compilebench ============================================== 00:10:01 (1606003801)
[126471.259208] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.29720 -i 2         -r 2 --makej
[126471.542072] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720 -i 2 -r 2 --makej
[126514.484905] LustreError: 1434:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:grp id:500 enforced:1 granted: 262147 pending:0 waiting:0 req:1 usage: 0 qunit:0 qtune:0 edquot:0 default:no
[126514.488658] LustreError: 1434:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 8 previous similar messages
[126936.722580] LNetError: 120-3: Refusing connection from 127.0.0.1 for 0.0.0.0@tcp: No matching NI
[126936.724348] LNetError: Skipped 5 previous similar messages
[126936.725334] LNetError: 1426:0:(socklnd_cb.c:1817:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.1
[126936.727033] LNetError: 1426:0:(socklnd_cb.c:1817:ksocknal_recv_hello()) Skipped 5 previous similar messages
[126936.728812] LNetError: 11b-b: Connection to 0.0.0.0@tcp at host 0.0.0.0 on port 7988 was reset: is it running a compatible version of Lustre and is 0.0.0.0@tcp one of its NIDs?
[126936.731431] LNetError: Skipped 5 previous similar messages
[127114.455871] LustreError: 1434:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:grp id:500 enforced:1 granted: 262147 pending:0 waiting:0 req:1 usage: 0 qunit:0 qtune:0 edquot:0 default:no
[127114.459890] LustreError: 1434:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 9 previous similar messages
[127369.681908] LNet: Service thread pid 32312 was inactive for 40.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
[127369.684957] Pid: 32312, comm: ll_ost_io00_015 3.10.0-1160.2.1.el7_lustre.x86_64 #1 SMP Fri Nov 13 23:48:56 UTC 2020
[127369.686695] Call Trace:
[127369.687305]  [&amp;lt;ffffffff943976d8&amp;gt;] call_rwsem_down_read_failed+0x18/0x30
[127369.688534]  [&amp;lt;ffffffffc0d5819a&amp;gt;] lu_cache_shrink+0x6a/0x2d0 [obdclass]
[127369.690062]  [&amp;lt;ffffffff941d17be&amp;gt;] shrink_slab+0xae/0x340
[127369.691044]  [&amp;lt;ffffffff941d4b4a&amp;gt;] do_try_to_free_pages+0x3ca/0x520
[127369.692183]  [&amp;lt;ffffffff941d4d9c&amp;gt;] try_to_free_pages+0xfc/0x180
[127369.693254]  [&amp;lt;ffffffff9477c679&amp;gt;] __alloc_pages_slowpath+0x457/0x729
[127369.694433]  [&amp;lt;ffffffff941c8d76&amp;gt;] __alloc_pages_nodemask+0x436/0x450
[127369.695584]  [&amp;lt;ffffffff942189d8&amp;gt;] alloc_pages_current+0x98/0x110
[127369.696758]  [&amp;lt;ffffffff942270c3&amp;gt;] new_slab+0x393/0x4e0
[127369.697730]  [&amp;lt;ffffffff942275dc&amp;gt;] ___slab_alloc+0x3cc/0x520
[127369.698762]  [&amp;lt;ffffffff9477dcc1&amp;gt;] __slab_alloc+0x40/0x5c
[127369.699752]  [&amp;lt;ffffffff9422879b&amp;gt;] kmem_cache_alloc+0x19b/0x1f0
[127369.700844]  [&amp;lt;ffffffffc06443c9&amp;gt;] spl_kmem_cache_alloc+0x99/0x150 [spl]
[127369.702077]  [&amp;lt;ffffffffc0811697&amp;gt;] zio_data_buf_alloc+0x57/0x60 [zfs]
[127369.704042]  [&amp;lt;ffffffffc07454cc&amp;gt;] abd_alloc_linear+0xac/0xf0 [zfs]
[127369.705299]  [&amp;lt;ffffffffc074580c&amp;gt;] abd_alloc+0x2fc/0x460 [zfs]
[127369.706404]  [&amp;lt;ffffffffc0749dfe&amp;gt;] arc_hdr_alloc_pabd+0x6e/0x110 [zfs]
[127369.707597]  [&amp;lt;ffffffffc0749f5c&amp;gt;] arc_hdr_alloc+0xbc/0x120 [zfs]
[127369.708759]  [&amp;lt;ffffffffc074ebbd&amp;gt;] arc_alloc_buf+0x3d/0xb0 [zfs]
[127369.709914]  [&amp;lt;ffffffffc074ec53&amp;gt;] arc_loan_buf+0x23/0x30 [zfs]
[127369.710979]  [&amp;lt;ffffffffc075f02a&amp;gt;] dmu_request_arcbuf+0x1a/0x20 [zfs]
[127369.712178]  [&amp;lt;ffffffffc1295a61&amp;gt;] osd_bufs_get+0x3d1/0xf80 [osd_zfs]
[127369.713412]  [&amp;lt;ffffffffc1406ba6&amp;gt;] ofd_preprw_write.isra.30+0x476/0xea0 [ofd]
[127369.714748]  [&amp;lt;ffffffffc14079ef&amp;gt;] ofd_preprw+0x41f/0x1240 [ofd]
[127369.715903]  [&amp;lt;ffffffffc109ce99&amp;gt;] tgt_brw_write+0xcc9/0x1ad0 [ptlrpc]
[127369.717515]  [&amp;lt;ffffffffc109ef1a&amp;gt;] tgt_request_handle+0xada/0x1570 [ptlrpc]
[127369.718879]  [&amp;lt;ffffffffc104388b&amp;gt;] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc]
[127369.720383]  [&amp;lt;ffffffffc10471f4&amp;gt;] ptlrpc_main+0xb34/0x1470 [ptlrpc]
[127369.721582]  [&amp;lt;ffffffff940c5c21&amp;gt;] kthread+0xd1/0xe0
[127369.722565]  [&amp;lt;ffffffff94793df7&amp;gt;] ret_from_fork_nospec_end+0x0/0x39
[127369.723730]  [&amp;lt;ffffffffffffffff&amp;gt;] 0xffffffffffffffff
[127369.724861] LustreError: dumping log to /tmp/lustre-log.1606004700.32312
[127536.712930] LNetError: 120-3: Refusing connection from 127.0.0.1 for 0.0.0.0@tcp: No matching NI
[127536.714600] LNetError: Skipped 5 previous similar messages
[127536.715594] LNetError: 1425:0:(socklnd_cb.c:1817:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.1
[127536.717408] LNetError: 1425:0:(socklnd_cb.c:1817:ksocknal_recv_hello()) Skipped 5 previous similar messages
[127536.719080] LNetError: 11b-b: Connection to 0.0.0.0@tcp at host 0.0.0.0 on port 7988 was reset: is it running a compatible version of Lustre and is 0.0.0.0@tcp one of its NIDs?
[127536.721754] LNetError: Skipped 5 previous similar messages
[127554.606912] INFO: task kthreadd:2 blocked for more than 120 seconds.
[127554.608205] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[127554.609530] kthreadd        D ffff8cb1bfd1acc0     0     2      0 0x00000000
[127554.610898] Call Trace:
[127554.611379]  [&amp;lt;ffffffff947871e9&amp;gt;] schedule+0x29/0x70
[127554.612254]  [&amp;lt;ffffffff94788bd5&amp;gt;] rwsem_down_read_failed+0x105/0x1c0
[127554.613484]  [&amp;lt;ffffffff943976d8&amp;gt;] call_rwsem_down_read_failed+0x18/0x30
[127554.614656]  [&amp;lt;ffffffff947864d0&amp;gt;] down_read+0x20/0x40
[127554.615631]  [&amp;lt;ffffffffc0d5819a&amp;gt;] lu_cache_shrink+0x6a/0x2d0 [obdclass]
[127554.616818]  [&amp;lt;ffffffff941d17be&amp;gt;] shrink_slab+0xae/0x340
[127554.617801]  [&amp;lt;ffffffff942428c1&amp;gt;] ? vmpressure+0x21/0x90
[127554.618769]  [&amp;lt;ffffffff941d4b4a&amp;gt;] do_try_to_free_pages+0x3ca/0x520
[127554.619883]  [&amp;lt;ffffffff941d4d9c&amp;gt;] try_to_free_pages+0xfc/0x180
[127554.620949]  [&amp;lt;ffffffff9477c679&amp;gt;] __alloc_pages_slowpath+0x457/0x729
[127554.622090]  [&amp;lt;ffffffff941c8d76&amp;gt;] __alloc_pages_nodemask+0x436/0x450
[127554.623236]  [&amp;lt;ffffffff940c5b50&amp;gt;] ? insert_kthread_work+0x40/0x40
[127554.624353]  [&amp;lt;ffffffff940987dd&amp;gt;] copy_process+0x1dd/0x1a80
[127554.625359]  [&amp;lt;ffffffff9409a231&amp;gt;] do_fork+0x91/0x330
[127554.626273]  [&amp;lt;ffffffff9409a4f6&amp;gt;] kernel_thread+0x26/0x30
[127554.627261]  [&amp;lt;ffffffff940c6601&amp;gt;] kthreadd+0x2e1/0x320
[127554.628212]  [&amp;lt;ffffffff940c6320&amp;gt;] ? kthread_create_on_cpu+0x60/0x60
[127554.629355]  [&amp;lt;ffffffff94793df7&amp;gt;] ret_from_fork_nospec_begin+0x21/0x21
[127554.630518]  [&amp;lt;ffffffff940c6320&amp;gt;] ? kthread_create_on_cpu+0x60/0x60
[127554.631649] INFO: task kswapd0:35 blocked for more than 120 seconds.
&#8230;
[127554.651560] INFO: task spl_dynamic_tas:639 blocked for more than 120 seconds.
[127554.652778] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[127554.654181] spl_dynamic_tas D ffff8cb1b80ac200     0   639      2 0x00000000
[127554.655493] Call Trace:
[127554.655959]  [&amp;lt;ffffffff947871e9&amp;gt;] schedule+0x29/0x70
[127554.656899]  [&amp;lt;ffffffff94784cd1&amp;gt;] schedule_timeout+0x221/0x2d0
[127554.657966]  [&amp;lt;ffffffff940d7202&amp;gt;] ? check_preempt_curr+0x92/0xa0
[127554.659047]  [&amp;lt;ffffffff940d7229&amp;gt;] ? ttwu_do_wakeup+0x19/0xe0
[127554.660074]  [&amp;lt;ffffffff9478759d&amp;gt;] wait_for_completion+0xfd/0x140
[127554.661162]  [&amp;lt;ffffffff940daaf0&amp;gt;] ? wake_up_state+0x20/0x20
[127554.662175]  [&amp;lt;ffffffffc0645ad0&amp;gt;] ? taskq_thread_spawn+0x60/0x60 [spl]
[127554.663344]  [&amp;lt;ffffffff940c5a7a&amp;gt;] kthread_create_on_node+0xaa/0x140
[127554.664466]  [&amp;lt;ffffffff94391d3b&amp;gt;] ? string.isra.7+0x3b/0xf0
[127554.665480]  [&amp;lt;ffffffffc0645ad0&amp;gt;] ? taskq_thread_spawn+0x60/0x60 [spl]
[127554.666732]  [&amp;lt;ffffffffc0645ad0&amp;gt;] ? taskq_thread_spawn+0x60/0x60 [spl]
[127554.667890]  [&amp;lt;ffffffffc064510c&amp;gt;] spl_kthread_create+0x9c/0xf0 [spl]
[127554.669043]  [&amp;lt;ffffffffc064692b&amp;gt;] taskq_thread_create+0x6b/0x110 [spl]
[127554.670216]  [&amp;lt;ffffffffc06469e2&amp;gt;] taskq_thread_spawn_task+0x12/0x40 [spl]
[127554.671431]  [&amp;lt;ffffffffc0645d7c&amp;gt;] taskq_thread+0x2ac/0x4f0 [spl]
[127554.672501]  [&amp;lt;ffffffff940daaf0&amp;gt;] ? wake_up_state+0x20/0x20
[127554.673508]  [&amp;lt;ffffffffc0645ad0&amp;gt;] ? taskq_thread_spawn+0x60/0x60 [spl]
[127554.674667]  [&amp;lt;ffffffff940c5c21&amp;gt;] kthread+0xd1/0xe0
[127554.675565]  [&amp;lt;ffffffff940c5b50&amp;gt;] ? insert_kthread_work+0x40/0x40
[127554.676668]  [&amp;lt;ffffffff94793df7&amp;gt;] ret_from_fork_nospec_begin+0x21/0x21
[127554.677834]  [&amp;lt;ffffffff940c5b50&amp;gt;] ? insert_kthread_work+0x40/0x40
[127554.678922] INFO: task dbuf_evict:645 blocked for more than 120 seconds.
[127554.680124] &quot;echo 0 &amp;gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.
[127554.681521] dbuf_evict      D ffff8cb1bfd1acc0     0   645      2 0x00000000
[127554.682806] Call Trace:
[127554.683277]  [&amp;lt;ffffffff94788109&amp;gt;] schedule_preempt_disabled+0x29/0x70
[127554.684492]  [&amp;lt;ffffffff94786077&amp;gt;] __mutex_lock_slowpath+0xc7/0x1d0
[127554.685602]  [&amp;lt;ffffffff9478544f&amp;gt;] mutex_lock+0x1f/0x2f
[127554.686565]  [&amp;lt;ffffffffc074ee36&amp;gt;] arc_buf_destroy+0x96/0x130 [zfs]
[127554.687699]  [&amp;lt;ffffffffc07563e1&amp;gt;] dbuf_destroy+0x31/0x3b0 [zfs]
[127554.688790]  [&amp;lt;ffffffffc07b8176&amp;gt;] ? spa_get_random+0x36/0x60 [zfs]
[127554.689957]  [&amp;lt;ffffffffc0756800&amp;gt;] dbuf_evict_one+0xa0/0xc0 [zfs]
[127554.691058]  [&amp;lt;ffffffffc07597ff&amp;gt;] dbuf_evict_thread+0x10f/0x1d0 [zfs]
[127554.692222]  [&amp;lt;ffffffffc07596f0&amp;gt;] ? dbuf_write_override_done+0x1e0/0x1e0 [zfs]
[127554.693510]  [&amp;lt;ffffffffc0645063&amp;gt;] thread_generic_wrapper+0x73/0x80 [spl]
[127554.694696]  [&amp;lt;ffffffffc0644ff0&amp;gt;] ? __thread_exit+0x20/0x20 [spl]
[127554.695786]  [&amp;lt;ffffffff940c5c21&amp;gt;] kthread+0xd1/0xe0
[127554.696691]  [&amp;lt;ffffffff940c5b50&amp;gt;] ? insert_kthread_work+0x40/0x40
[127554.697790]  [&amp;lt;ffffffff94793df7&amp;gt;] ret_from_fork_nospec_begin+0x21/0x21
[127554.698951]  [&amp;lt;ffffffff940c5b50&amp;gt;] ? insert_kthread_work+0x40/0x40
&#8230;
[132011.533944] LNetError: 120-3: Refusing connection from 127.0.0.1 for 0.0.0.0@tcp: No matching NI
[132011.535685] LNetError: Skipped 7 previous similar messages
[132031.291351] pickup invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0
[132031.292841] pickup cpuset=/ mems_allowed=0
[132031.293561] CPU: 1 PID: 23709 Comm: pickup Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.2.1.el7_lustre.x86_64 #1
[132031.295542] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[132031.296523] Call Trace:
[132031.297056]  [&amp;lt;ffffffff947813c0&amp;gt;] dump_stack+0x19/0x1b
[132031.297945]  [&amp;lt;ffffffff9477bce0&amp;gt;] dump_header+0x90/0x229
[132031.298881]  [&amp;lt;ffffffff9478eb6f&amp;gt;] ? notifier_call_chain+0x4f/0x70
[132031.300010]  [&amp;lt;ffffffff940cbfe8&amp;gt;] ? __blocking_notifier_call_chain+0x58/0x70
[132031.301234]  [&amp;lt;ffffffff941c227e&amp;gt;] check_panic_on_oom+0x2e/0x60
[132031.302243]  [&amp;lt;ffffffff941c25f4&amp;gt;] out_of_memory+0x194/0x500
[132031.303196]  [&amp;lt;ffffffff9477c7fd&amp;gt;] __alloc_pages_slowpath+0x5db/0x729
[132031.304277]  [&amp;lt;ffffffff941c8d76&amp;gt;] __alloc_pages_nodemask+0x436/0x450
[132031.305371]  [&amp;lt;ffffffff9421c269&amp;gt;] alloc_pages_vma+0xa9/0x200
[132031.306343]  [&amp;lt;ffffffff94209805&amp;gt;] __read_swap_cache_async+0x115/0x190
[132031.307437]  [&amp;lt;ffffffff941bd45e&amp;gt;] ? __find_get_page+0x1e/0xa0
[132031.308426]  [&amp;lt;ffffffff942098a6&amp;gt;] read_swap_cache_async+0x26/0x60
[132031.309470]  [&amp;lt;ffffffff94209a8b&amp;gt;] swapin_readahead+0x1ab/0x210
[132031.310493]  [&amp;lt;ffffffff9438d222&amp;gt;] ? radix_tree_lookup_slot+0x22/0x50
[132031.311577]  [&amp;lt;ffffffff941bd45e&amp;gt;] ? __find_get_page+0x1e/0xa0
[132031.312560]  [&amp;lt;ffffffff941f1f1f&amp;gt;] do_swap_page+0x23f/0x7c0
[132031.313503]  [&amp;lt;ffffffff941bdf6b&amp;gt;] ? unlock_page+0x2b/0x30
[132031.314431]  [&amp;lt;ffffffff941ee4d9&amp;gt;] ? do_read_fault.isra.63+0x139/0x1b0
[132031.315527]  [&amp;lt;ffffffff941f5cb7&amp;gt;] handle_mm_fault+0xaa7/0xfb0
[132031.316532]  [&amp;lt;ffffffff9478e653&amp;gt;] __do_page_fault+0x213/0x500
[132031.317512]  [&amp;lt;ffffffff9478ea26&amp;gt;] trace_do_page_fault+0x56/0x150
[132031.318541]  [&amp;lt;ffffffff9478dfa2&amp;gt;] do_async_page_fault+0x22/0xf0
[132031.319681]  [&amp;lt;ffffffff9478a7a8&amp;gt;] async_page_fault+0x28/0x30
[132031.320657] Mem-Info:
[132031.321091] active_anon:3089 inactive_anon:3139 isolated_anon:0
[132031.321091]  active_file:343 inactive_file:2107 isolated_file:0
[132031.321091]  unevictable:0 dirty:10 writeback:0 unstable:0
[132031.321091]  slab_reclaimable:4834 slab_unreclaimable:137783
[132031.321091]  mapped:1028 shmem:4549 pagetables:1120 bounce:0
[132031.321091]  free:12862 free_pcp:30 free_cma:0
[132031.326525] Node 0 DMA free:6916kB min:416kB low:520kB high:624kB active_anon:68kB inactive_anon:100kB active_file:0kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:132kB slab_reclaimable:152kB slab_unreclaimable:1516kB kernel_stack:192kB pagetables:104kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:297 all_unreclaimable? yes
[132031.333546] lowmem_reserve[]: 0 1660 1660 1660
[132031.334501] Node 0 DMA32 free:44532kB min:44636kB low:55792kB high:66952kB active_anon:12288kB inactive_anon:12456kB active_file:1372kB inactive_file:8404kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080744kB managed:1703244kB mlocked:0kB dirty:40kB writeback:0kB mapped:4112kB shmem:18064kB slab_reclaimable:19184kB slab_unreclaimable:549616kB kernel_stack:12144kB pagetables:4376kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:51805 all_unreclaimable? yes
[132031.342090] lowmem_reserve[]: 0 0 0 0
[132031.342912] Node 0 DMA: 41*4kB (UEM) 26*8kB (UEM) 37*16kB (UEM) 20*32kB (UEM) 13*64kB (UEM) 7*128kB (UEM) 2*256kB (E) 2*512kB (UE) 2*1024kB (U) 0*2048kB 0*4096kB = 6916kB
[132031.346179] Node 0 DMA32: 675*4kB (UEM) 2708*8kB (UEM) 1138*16kB (UM) 58*32kB (UM) 2*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44556kB
[132031.349119] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[132031.350593] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[132031.352019] 7419 total pagecache pages
[132031.352692] 404 pages in swap cache
[132031.353308] Swap cache stats: add 29767, delete 29363, find 2416/3651
[132031.354391] Free swap  = 3558652kB
[132031.355004] Total swap = 3671036kB
[132031.355606] 524184 pages RAM
[132031.356125] 0 pages HighMem/MovableOnly
[132031.356798] 94396 pages reserved
[132031.357361] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[132031.358680] [  352]     0   352     9863      827      24       58             0 systemd-journal
[132031.360168] [  393]     0   393    47594        0      28      123             0 lvmetad
[132031.361531] [  400]     0   400    11421        2      25      173         -1000 systemd-udevd
[132031.362986] [  666]     0   666    13883        1      27      111         -1000 auditd
[132031.364325] [  689]   999   689   153058        0      61     1385             0 polkitd
[132031.365712] [  691]     0   691     5385       18      16       42             0 irqbalance
[132031.367113] [  693]    81   693    14594       88      34      120          -900 dbus-daemon
[132031.368531] [  695]    32   695    17314       14      37      126             0 rpcbind
[132031.369896] [  698]     0   698     6637        1      17      117             0 systemd-logind
[132031.371348] [  701]     0   701   155458       91      89      453             0 NetworkManager
[132031.372827] [  702]     0   702    51907      117      35       69             0 zed
[132031.374140] [  714]     0   714    48801        0      34      129             0 gssproxy
[132031.375517] [  721]    38   721    11826       38      28      145             0 ntpd
[132031.376860] [  747]     0   747    25735        7      54      508             0 dhclient
[132031.378240] [ 1004]     0  1004    28234        3      57      256         -1000 sshd
[132031.379551] [ 1005]     0  1005   143570      145      99     2728             0 tuned
[132031.380886] [ 1013]     0  1013    74581      703      73      830             0 rsyslogd
[132031.382262] [ 1015]     0  1015     6792        0      18       63             0 xinetd
[132031.383609] [ 1018]   997  1018    56473       46      22      131             0 munged
[132031.384971] [ 1019]    29  1019    10610        0      25      208             0 rpc.statd
[132031.386353] [ 1083]     0  1083   172961      551      80      518             0 automount
[132031.387744] [ 1086]     0  1086    31595       25      19      129             0 crond
[132031.389075] [ 1089]     0  1089     6477        0      18       52             0 atd
[132031.390383] [ 1099]     0  1099    27551        1      10       32             0 agetty
[132031.391736] [ 1100]     0  1100    27551        1      13       32             0 agetty
[132031.393078] [ 1438]     0  1438    22436       20      42      241             0 master
[132031.394416] [ 1449]    89  1449    22479       15      46      238             0 qmgr
[132031.395739] [23709]    89 23709    22462      103      44      194             0 pickup
[132031.397438] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled
[132031.397438] 
[132031.399072] CPU: 1 PID: 23709 Comm: pickup Kdump: loaded Tainted: P           OE  ------------   3.10.0-1160.2.1.el7_lustre.x86_64 #1
[132031.401036] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[132031.401994] Call Trace:
[132031.402435]  [&amp;lt;ffffffff947813c0&amp;gt;] dump_stack+0x19/0x1b
[132031.403303]  [&amp;lt;ffffffff9477a918&amp;gt;] panic+0xe8/0x21f
[132031.404121]  [&amp;lt;ffffffff941c22a5&amp;gt;] check_panic_on_oom+0x55/0x60
[132031.405099]  [&amp;lt;ffffffff941c25f4&amp;gt;] out_of_memory+0x194/0x500
[132031.406041]  [&amp;lt;ffffffff9477c7fd&amp;gt;] __alloc_pages_slowpath+0x5db/0x729
[132031.407118]  [&amp;lt;ffffffff941c8d76&amp;gt;] __alloc_pages_nodemask+0x436/0x450
[132031.408184]  [&amp;lt;ffffffff9421c269&amp;gt;] alloc_pages_vma+0xa9/0x200
[132031.409141]  [&amp;lt;ffffffff94209805&amp;gt;] __read_swap_cache_async+0x115/0x190
[132031.410223]  [&amp;lt;ffffffff941bd45e&amp;gt;] ? __find_get_page+0x1e/0xa0
[132031.411192]  [&amp;lt;ffffffff942098a6&amp;gt;] read_swap_cache_async+0x26/0x60
[132031.412219]  [&amp;lt;ffffffff94209a8b&amp;gt;] swapin_readahead+0x1ab/0x210
[132031.413202]  [&amp;lt;ffffffff9438d222&amp;gt;] ? radix_tree_lookup_slot+0x22/0x50
[132031.414267]  [&amp;lt;ffffffff941bd45e&amp;gt;] ? __find_get_page+0x1e/0xa0
[132031.415235]  [&amp;lt;ffffffff941f1f1f&amp;gt;] do_swap_page+0x23f/0x7c0
[132031.416172]  [&amp;lt;ffffffff941bdf6b&amp;gt;] ? unlock_page+0x2b/0x30
[132031.417082]  [&amp;lt;ffffffff941ee4d9&amp;gt;] ? do_read_fault.isra.63+0x139/0x1b0
[132031.418165]  [&amp;lt;ffffffff941f5cb7&amp;gt;] handle_mm_fault+0xaa7/0xfb0
[132031.419136]  [&amp;lt;ffffffff9478e653&amp;gt;] __do_page_fault+0x213/0x500
[132031.420101]  [&amp;lt;ffffffff9478ea26&amp;gt;] trace_do_page_fault+0x56/0x150
[132031.421117]  [&amp;lt;ffffffff9478dfa2&amp;gt;] do_async_page_fault+0x22/0xf0
[132031.422111]  [&amp;lt;ffffffff9478a7a8&amp;gt;] async_page_fault+0x28/0x30
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This crash may be related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11929&quot; title=&quot;sanity-quota test 7d hangs with &#8216;(qmt_handler.c:421:qmt_dqacq0()) $$$ Release too much!&#8217; on the MDS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11929&quot;&gt;LU-11929&lt;/a&gt; and/or &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13068&quot; title=&quot;parallel-scale-nfsv4 test compilebench hangs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13068&quot;&gt;LU-13068&lt;/a&gt;.&lt;/p&gt;</description>
                <environment></environment>
        <key id="61831">LU-14169</key>
            <summary>parallel-scale-nfsv3 test compilebench crashes with OOM</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Tue, 1 Dec 2020 23:12:40 +0000</created>
                <updated>Wed, 2 Dec 2020 00:53:57 +0000</updated>
                                            <version>Lustre 2.12.6</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>2</watches>
                                                                                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i01g67:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>