Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.6
-
None
-
3
-
9223372036854775807
Description
We’ve seen parallel-scale-nfsv3 test_compilebench crashes with OOM only once. In this case https://testing.whamcloud.com/test_sets/a96851ea-9fc2-4442-b901-b19d0d5c0fc1, we are testing DNE with ZFS for RHEL 7.9 client/server. The last thing seen in the suite_log is
== parallel-scale-nfsv3 test compilebench: compilebench ============================================== 00:10:01 (1606003801) OPTIONS: cbench_DIR=/usr/bin cbench_IDIRS=2 cbench_RUNS=2 trevis-46vm1.trevis.whamcloud.com trevis-46vm2 free space = 15279104 KB ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720 -i 2 -r 2 --makej using working directory /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720, 2 intial dirs 2 runs native unpatched native-0 222MB in 196.06 seconds (1.13 MB/s) native patched native-0 109MB in 56.43 seconds (1.94 MB/s) native patched compiled native-0 691MB in 145.28 seconds (4.76 MB/s) create dir kernel-0 222MB in 184.62 seconds (1.20 MB/s)
On the MDS2/4 (vm5) console log, we see communication errors
[126471.115681] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.29720 -i 2 -r 2 --makej [126471.388277] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720 -i 2 -r 2 --makej [132040.380810] Lustre: 22680:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606009364/real 1606009364] req@ffff8fdc15e2d680 x1683944906779264/t0(0) o13->lustre-OST0002-osc-MDT0001@10.9.5.186@tcp:7/4 lens 224/368 e 0 to 1 dl 1606009371 ref 1 fl Rpc:X/0/ffffffff rc 0/-1 [132040.385562] Lustre: lustre-OST0002-osc-MDT0001: Connection to lustre-OST0002 (at 10.9.5.186@tcp) was lost; in progress operations using this service will wait for recovery to complete [132040.388239] Lustre: Skipped 4 previous similar messages [132078.146048] Lustre: lustre-MDT0003: haven't heard from client lustre-MDT0003-lwp-OST0000_UUID (at 10.9.5.186@tcp) in 51 seconds. I think it's dead, and I am evicting it. exp ffff8fdc3f68f000, cur 1606009409 expire 1606009379 last 1606009358 [132078.149715] Lustre: Skipped 1 previous similar message <ConMan> Console [trevis-46vm5] disconnected from <trevis-46:6004> at 11-22 01:44.
On the MDS1/3 (vm4) console log, we see similar issues, but many more errors like
[126470.851038] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test compilebench: compilebench ============================================== 00:10:01 (1606003801)
[126471.153265] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.29720 -i 2 -r 2 --makej
[126471.438784] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720 -i 2 -r 2 --makej
[126514.413660] LustreError: 2619:0:(qmt_handler.c:514:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:262147 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:16204339 soft:15432704 granted:0 time:0 qunit: 262144 edquot:0 may_rel:0 revoke:0 default:no
[126514.418116] LustreError: 2619:0:(qmt_handler.c:514:qmt_dqacq0()) Skipped 8 previous similar messages
[127114.385591] LustreError: 2619:0:(qmt_handler.c:514:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:262147 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:16204339 soft:15432704 granted:0 time:0 qunit: 262144 edquot:0 may_rel:0 revoke:0 default:no
[127114.390114] LustreError: 2619:0:(qmt_handler.c:514:qmt_dqacq0()) Skipped 9 previous similar messages
[127774.353234] LustreError: 6643:0:(qmt_handler.c:514:qmt_dqacq0()) $$$ Release too much! uuid:lustre-MDT0000-lwp-OST0000_UUID release:262147 granted:0, total:0 qmt:lustre-QMT0000 pool:0-dt id:500 enforced:1 hard:16204339 soft:15432704 granted:0 time:0 qunit: 262144 edquot:0 may_rel:0 revoke:0 default:no
[127774.357978] LustreError: 6643:0:(qmt_handler.c:514:qmt_dqacq0()) Skipped 10 previous similar messages
[127931.015494] Lustre: 19775:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606004660/real 1606004660] req@ffff9369daabba80 x1683953590123520/t0(0) o101->lustre-OST0004-osc-ffff9369fa533800@10.9.5.186@tcp:28/4 lens 328/400 e 24 to 1 dl 1606005261 ref 1 fl Rpc:X/0/ffffffff rc 0/-1
[127931.020478] Lustre: lustre-OST0004-osc-ffff9369fa533800: Connection to lustre-OST0004 (at 10.9.5.186@tcp) was lost; in progress operations using this service will wait for recovery to complete
[127931.023258] Lustre: Skipped 2 previous similar messages
[127931.029602] Lustre: lustre-OST0002-osc-ffff9369fa533800: Connection restored to 10.9.5.186@tcp (at 10.9.5.186@tcp)
[127931.031642] Lustre: Skipped 36 previous similar messages
[128263.190427] Lustre: 19776:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606005261/real 1606005261] req@ffff936a18bf4900 x1683953590431360/t0(0) o103->lustre-OST0002-osc-ffff9369fa533800@10.9.5.186@tcp:17/18 lens 888/224 e 1 to 1 dl 1606005593 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
[128263.195381] Lustre: 19776:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 300 previous similar messages
[128263.197072] Lustre: lustre-OST0002-osc-ffff9369fa533800: Connection to lustre-OST0002 (at 10.9.5.186@tcp) was lost; in progress operations using this service will wait for recovery to complete
[128263.199874] Lustre: Skipped 7 previous similar messages
…
[128733.982640] LustreError: 18293:0:(osc_cache.c:959:osc_extent_wait()) extent ffff9369fe27d000@{[0 -> 255/255], [3|0|+|rpc|wihuY|ffff936a01033dc0], [1703936|13|+|-|ffff9369c4e1cb40|256|ffff936a1a76d280]} lustre-OST0003-osc-ffff9369fa533800: wait ext to 0 timedout, recovery in progress?
[128733.987044] LustreError: 18293:0:(osc_cache.c:959:osc_extent_wait()) ### extent: ffff9369fe27d000 ns: lustre-OST0003-osc-ffff9369fa533800 lock: ffff9369c4e1cb40/0x457300162b236ab9 lrc: 3/0,0 mode: PW/PW res: [0x4c9f7:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->53247) flags: 0x29400000000 nid: local remote: 0x345f4682719796fe expref: -99 pid: 14029 timeout: 0 lvb_type: 1
[128774.136661] LustreError: 2606:0:(osc_cache.c:959:osc_extent_wait()) extent ffff9369cee80b00@{[0 -> 255/255], [3|0|+|rpc|wihuY|ffff936a1535f8c0], [1703936|1|+|-|ffff936a34d27680|256|ffff936a38fb1080]} lustre-OST0002-osc-ffff9369fa533800: wait ext to 0 timedout, recovery in progress?
[128774.140963] LustreError: 2606:0:(osc_cache.c:959:osc_extent_wait()) ### extent: ffff9369cee80b00 ns: lustre-OST0002-osc-ffff9369fa533800 lock: ffff936a34d27680/0x457300162b2371dc lrc: 3/0,0 mode: PW/PW res: [0x4cc1e:0x0:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x29400000000 nid: local remote: 0x345f468271979886 expref: -99 pid: 14029 timeout: 0 lvb_type: 1
…
[131959.075762] Lustre: 19776:0:(client.c:2146:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1606008982/real 1606008982] req@ffff936a19bc0000 x1683953590680128/t0(0) o103->lustre-OST0000-osc-ffff9369fa533800@10.9.5.186@tcp:17/18 lens 496/224 e 12 to 1 dl 1606009289 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
[131959.080744] Lustre: 19776:0:(client.c:2146:ptlrpc_expire_one_request()) Skipped 15 previous similar messages
[131959.091626] Lustre: lustre-OST0000-osc-ffff9369fa533800: Connection restored to 10.9.5.186@tcp (at 10.9.5.186@tcp)
[131959.093410] Lustre: Skipped 15 previous similar messages
[132076.505428] Lustre: MGS: haven't heard from client 2e6a78ed-898c-e05c-81c2-d8a60a1bee04 (at 10.9.5.186@tcp) in 49 seconds. I think it's dead, and I am evicting it. exp ffff936a0d00ac00, cur 1606009407 expire 1606009377 last 1606009358
[132076.511277] Lustre: Skipped 1 previous similar message
<ConMan> Console [trevis-46vm4] disconnected from <trevis-46:6003> at 11-22 01:44.
ON the OSS (vm3) console log, we see the oom, hung tasks and stack traces
[126470.900307] Lustre: DEBUG MARKER: == parallel-scale-nfsv3 test compilebench: compilebench ============================================== 00:10:01 (1606003801) [126471.259208] Lustre: DEBUG MARKER: /usr/sbin/lctl mark .\/compilebench -D \/mnt\/lustre\/d0.parallel-scale-nfs\/d0.compilebench.29720 -i 2 -r 2 --makej [126471.542072] Lustre: DEBUG MARKER: ./compilebench -D /mnt/lustre/d0.parallel-scale-nfs/d0.compilebench.29720 -i 2 -r 2 --makej [126514.484905] LustreError: 1434:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:grp id:500 enforced:1 granted: 262147 pending:0 waiting:0 req:1 usage: 0 qunit:0 qtune:0 edquot:0 default:no [126514.488658] LustreError: 1434:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 8 previous similar messages [126936.722580] LNetError: 120-3: Refusing connection from 127.0.0.1 for 0.0.0.0@tcp: No matching NI [126936.724348] LNetError: Skipped 5 previous similar messages [126936.725334] LNetError: 1426:0:(socklnd_cb.c:1817:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.1 [126936.727033] LNetError: 1426:0:(socklnd_cb.c:1817:ksocknal_recv_hello()) Skipped 5 previous similar messages [126936.728812] LNetError: 11b-b: Connection to 0.0.0.0@tcp at host 0.0.0.0 on port 7988 was reset: is it running a compatible version of Lustre and is 0.0.0.0@tcp one of its NIDs? [126936.731431] LNetError: Skipped 5 previous similar messages [127114.455871] LustreError: 1434:0:(qsd_handler.c:340:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:lustre-OST0000 qtype:grp id:500 enforced:1 granted: 262147 pending:0 waiting:0 req:1 usage: 0 qunit:0 qtune:0 edquot:0 default:no [127114.459890] LustreError: 1434:0:(qsd_handler.c:340:qsd_req_completion()) Skipped 9 previous similar messages [127369.681908] LNet: Service thread pid 32312 was inactive for 40.14s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [127369.684957] Pid: 32312, comm: ll_ost_io00_015 3.10.0-1160.2.1.el7_lustre.x86_64 #1 SMP Fri Nov 13 23:48:56 UTC 2020 [127369.686695] Call Trace: [127369.687305] [<ffffffff943976d8>] call_rwsem_down_read_failed+0x18/0x30 [127369.688534] [<ffffffffc0d5819a>] lu_cache_shrink+0x6a/0x2d0 [obdclass] [127369.690062] [<ffffffff941d17be>] shrink_slab+0xae/0x340 [127369.691044] [<ffffffff941d4b4a>] do_try_to_free_pages+0x3ca/0x520 [127369.692183] [<ffffffff941d4d9c>] try_to_free_pages+0xfc/0x180 [127369.693254] [<ffffffff9477c679>] __alloc_pages_slowpath+0x457/0x729 [127369.694433] [<ffffffff941c8d76>] __alloc_pages_nodemask+0x436/0x450 [127369.695584] [<ffffffff942189d8>] alloc_pages_current+0x98/0x110 [127369.696758] [<ffffffff942270c3>] new_slab+0x393/0x4e0 [127369.697730] [<ffffffff942275dc>] ___slab_alloc+0x3cc/0x520 [127369.698762] [<ffffffff9477dcc1>] __slab_alloc+0x40/0x5c [127369.699752] [<ffffffff9422879b>] kmem_cache_alloc+0x19b/0x1f0 [127369.700844] [<ffffffffc06443c9>] spl_kmem_cache_alloc+0x99/0x150 [spl] [127369.702077] [<ffffffffc0811697>] zio_data_buf_alloc+0x57/0x60 [zfs] [127369.704042] [<ffffffffc07454cc>] abd_alloc_linear+0xac/0xf0 [zfs] [127369.705299] [<ffffffffc074580c>] abd_alloc+0x2fc/0x460 [zfs] [127369.706404] [<ffffffffc0749dfe>] arc_hdr_alloc_pabd+0x6e/0x110 [zfs] [127369.707597] [<ffffffffc0749f5c>] arc_hdr_alloc+0xbc/0x120 [zfs] [127369.708759] [<ffffffffc074ebbd>] arc_alloc_buf+0x3d/0xb0 [zfs] [127369.709914] [<ffffffffc074ec53>] arc_loan_buf+0x23/0x30 [zfs] [127369.710979] [<ffffffffc075f02a>] dmu_request_arcbuf+0x1a/0x20 [zfs] [127369.712178] [<ffffffffc1295a61>] osd_bufs_get+0x3d1/0xf80 [osd_zfs] [127369.713412] [<ffffffffc1406ba6>] ofd_preprw_write.isra.30+0x476/0xea0 [ofd] [127369.714748] [<ffffffffc14079ef>] ofd_preprw+0x41f/0x1240 [ofd] [127369.715903] [<ffffffffc109ce99>] tgt_brw_write+0xcc9/0x1ad0 [ptlrpc] [127369.717515] [<ffffffffc109ef1a>] tgt_request_handle+0xada/0x1570 [ptlrpc] [127369.718879] [<ffffffffc104388b>] ptlrpc_server_handle_request+0x24b/0xab0 [ptlrpc] [127369.720383] [<ffffffffc10471f4>] ptlrpc_main+0xb34/0x1470 [ptlrpc] [127369.721582] [<ffffffff940c5c21>] kthread+0xd1/0xe0 [127369.722565] [<ffffffff94793df7>] ret_from_fork_nospec_end+0x0/0x39 [127369.723730] [<ffffffffffffffff>] 0xffffffffffffffff [127369.724861] LustreError: dumping log to /tmp/lustre-log.1606004700.32312 [127536.712930] LNetError: 120-3: Refusing connection from 127.0.0.1 for 0.0.0.0@tcp: No matching NI [127536.714600] LNetError: Skipped 5 previous similar messages [127536.715594] LNetError: 1425:0:(socklnd_cb.c:1817:ksocknal_recv_hello()) Error -104 reading HELLO from 127.0.0.1 [127536.717408] LNetError: 1425:0:(socklnd_cb.c:1817:ksocknal_recv_hello()) Skipped 5 previous similar messages [127536.719080] LNetError: 11b-b: Connection to 0.0.0.0@tcp at host 0.0.0.0 on port 7988 was reset: is it running a compatible version of Lustre and is 0.0.0.0@tcp one of its NIDs? [127536.721754] LNetError: Skipped 5 previous similar messages [127554.606912] INFO: task kthreadd:2 blocked for more than 120 seconds. [127554.608205] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [127554.609530] kthreadd D ffff8cb1bfd1acc0 0 2 0 0x00000000 [127554.610898] Call Trace: [127554.611379] [<ffffffff947871e9>] schedule+0x29/0x70 [127554.612254] [<ffffffff94788bd5>] rwsem_down_read_failed+0x105/0x1c0 [127554.613484] [<ffffffff943976d8>] call_rwsem_down_read_failed+0x18/0x30 [127554.614656] [<ffffffff947864d0>] down_read+0x20/0x40 [127554.615631] [<ffffffffc0d5819a>] lu_cache_shrink+0x6a/0x2d0 [obdclass] [127554.616818] [<ffffffff941d17be>] shrink_slab+0xae/0x340 [127554.617801] [<ffffffff942428c1>] ? vmpressure+0x21/0x90 [127554.618769] [<ffffffff941d4b4a>] do_try_to_free_pages+0x3ca/0x520 [127554.619883] [<ffffffff941d4d9c>] try_to_free_pages+0xfc/0x180 [127554.620949] [<ffffffff9477c679>] __alloc_pages_slowpath+0x457/0x729 [127554.622090] [<ffffffff941c8d76>] __alloc_pages_nodemask+0x436/0x450 [127554.623236] [<ffffffff940c5b50>] ? insert_kthread_work+0x40/0x40 [127554.624353] [<ffffffff940987dd>] copy_process+0x1dd/0x1a80 [127554.625359] [<ffffffff9409a231>] do_fork+0x91/0x330 [127554.626273] [<ffffffff9409a4f6>] kernel_thread+0x26/0x30 [127554.627261] [<ffffffff940c6601>] kthreadd+0x2e1/0x320 [127554.628212] [<ffffffff940c6320>] ? kthread_create_on_cpu+0x60/0x60 [127554.629355] [<ffffffff94793df7>] ret_from_fork_nospec_begin+0x21/0x21 [127554.630518] [<ffffffff940c6320>] ? kthread_create_on_cpu+0x60/0x60 [127554.631649] INFO: task kswapd0:35 blocked for more than 120 seconds. … [127554.651560] INFO: task spl_dynamic_tas:639 blocked for more than 120 seconds. [127554.652778] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [127554.654181] spl_dynamic_tas D ffff8cb1b80ac200 0 639 2 0x00000000 [127554.655493] Call Trace: [127554.655959] [<ffffffff947871e9>] schedule+0x29/0x70 [127554.656899] [<ffffffff94784cd1>] schedule_timeout+0x221/0x2d0 [127554.657966] [<ffffffff940d7202>] ? check_preempt_curr+0x92/0xa0 [127554.659047] [<ffffffff940d7229>] ? ttwu_do_wakeup+0x19/0xe0 [127554.660074] [<ffffffff9478759d>] wait_for_completion+0xfd/0x140 [127554.661162] [<ffffffff940daaf0>] ? wake_up_state+0x20/0x20 [127554.662175] [<ffffffffc0645ad0>] ? taskq_thread_spawn+0x60/0x60 [spl] [127554.663344] [<ffffffff940c5a7a>] kthread_create_on_node+0xaa/0x140 [127554.664466] [<ffffffff94391d3b>] ? string.isra.7+0x3b/0xf0 [127554.665480] [<ffffffffc0645ad0>] ? taskq_thread_spawn+0x60/0x60 [spl] [127554.666732] [<ffffffffc0645ad0>] ? taskq_thread_spawn+0x60/0x60 [spl] [127554.667890] [<ffffffffc064510c>] spl_kthread_create+0x9c/0xf0 [spl] [127554.669043] [<ffffffffc064692b>] taskq_thread_create+0x6b/0x110 [spl] [127554.670216] [<ffffffffc06469e2>] taskq_thread_spawn_task+0x12/0x40 [spl] [127554.671431] [<ffffffffc0645d7c>] taskq_thread+0x2ac/0x4f0 [spl] [127554.672501] [<ffffffff940daaf0>] ? wake_up_state+0x20/0x20 [127554.673508] [<ffffffffc0645ad0>] ? taskq_thread_spawn+0x60/0x60 [spl] [127554.674667] [<ffffffff940c5c21>] kthread+0xd1/0xe0 [127554.675565] [<ffffffff940c5b50>] ? insert_kthread_work+0x40/0x40 [127554.676668] [<ffffffff94793df7>] ret_from_fork_nospec_begin+0x21/0x21 [127554.677834] [<ffffffff940c5b50>] ? insert_kthread_work+0x40/0x40 [127554.678922] INFO: task dbuf_evict:645 blocked for more than 120 seconds. [127554.680124] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [127554.681521] dbuf_evict D ffff8cb1bfd1acc0 0 645 2 0x00000000 [127554.682806] Call Trace: [127554.683277] [<ffffffff94788109>] schedule_preempt_disabled+0x29/0x70 [127554.684492] [<ffffffff94786077>] __mutex_lock_slowpath+0xc7/0x1d0 [127554.685602] [<ffffffff9478544f>] mutex_lock+0x1f/0x2f [127554.686565] [<ffffffffc074ee36>] arc_buf_destroy+0x96/0x130 [zfs] [127554.687699] [<ffffffffc07563e1>] dbuf_destroy+0x31/0x3b0 [zfs] [127554.688790] [<ffffffffc07b8176>] ? spa_get_random+0x36/0x60 [zfs] [127554.689957] [<ffffffffc0756800>] dbuf_evict_one+0xa0/0xc0 [zfs] [127554.691058] [<ffffffffc07597ff>] dbuf_evict_thread+0x10f/0x1d0 [zfs] [127554.692222] [<ffffffffc07596f0>] ? dbuf_write_override_done+0x1e0/0x1e0 [zfs] [127554.693510] [<ffffffffc0645063>] thread_generic_wrapper+0x73/0x80 [spl] [127554.694696] [<ffffffffc0644ff0>] ? __thread_exit+0x20/0x20 [spl] [127554.695786] [<ffffffff940c5c21>] kthread+0xd1/0xe0 [127554.696691] [<ffffffff940c5b50>] ? insert_kthread_work+0x40/0x40 [127554.697790] [<ffffffff94793df7>] ret_from_fork_nospec_begin+0x21/0x21 [127554.698951] [<ffffffff940c5b50>] ? insert_kthread_work+0x40/0x40 … [132011.533944] LNetError: 120-3: Refusing connection from 127.0.0.1 for 0.0.0.0@tcp: No matching NI [132011.535685] LNetError: Skipped 7 previous similar messages [132031.291351] pickup invoked oom-killer: gfp_mask=0x200da, order=0, oom_score_adj=0 [132031.292841] pickup cpuset=/ mems_allowed=0 [132031.293561] CPU: 1 PID: 23709 Comm: pickup Kdump: loaded Tainted: P OE ------------ 3.10.0-1160.2.1.el7_lustre.x86_64 #1 [132031.295542] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [132031.296523] Call Trace: [132031.297056] [<ffffffff947813c0>] dump_stack+0x19/0x1b [132031.297945] [<ffffffff9477bce0>] dump_header+0x90/0x229 [132031.298881] [<ffffffff9478eb6f>] ? notifier_call_chain+0x4f/0x70 [132031.300010] [<ffffffff940cbfe8>] ? __blocking_notifier_call_chain+0x58/0x70 [132031.301234] [<ffffffff941c227e>] check_panic_on_oom+0x2e/0x60 [132031.302243] [<ffffffff941c25f4>] out_of_memory+0x194/0x500 [132031.303196] [<ffffffff9477c7fd>] __alloc_pages_slowpath+0x5db/0x729 [132031.304277] [<ffffffff941c8d76>] __alloc_pages_nodemask+0x436/0x450 [132031.305371] [<ffffffff9421c269>] alloc_pages_vma+0xa9/0x200 [132031.306343] [<ffffffff94209805>] __read_swap_cache_async+0x115/0x190 [132031.307437] [<ffffffff941bd45e>] ? __find_get_page+0x1e/0xa0 [132031.308426] [<ffffffff942098a6>] read_swap_cache_async+0x26/0x60 [132031.309470] [<ffffffff94209a8b>] swapin_readahead+0x1ab/0x210 [132031.310493] [<ffffffff9438d222>] ? radix_tree_lookup_slot+0x22/0x50 [132031.311577] [<ffffffff941bd45e>] ? __find_get_page+0x1e/0xa0 [132031.312560] [<ffffffff941f1f1f>] do_swap_page+0x23f/0x7c0 [132031.313503] [<ffffffff941bdf6b>] ? unlock_page+0x2b/0x30 [132031.314431] [<ffffffff941ee4d9>] ? do_read_fault.isra.63+0x139/0x1b0 [132031.315527] [<ffffffff941f5cb7>] handle_mm_fault+0xaa7/0xfb0 [132031.316532] [<ffffffff9478e653>] __do_page_fault+0x213/0x500 [132031.317512] [<ffffffff9478ea26>] trace_do_page_fault+0x56/0x150 [132031.318541] [<ffffffff9478dfa2>] do_async_page_fault+0x22/0xf0 [132031.319681] [<ffffffff9478a7a8>] async_page_fault+0x28/0x30 [132031.320657] Mem-Info: [132031.321091] active_anon:3089 inactive_anon:3139 isolated_anon:0 [132031.321091] active_file:343 inactive_file:2107 isolated_file:0 [132031.321091] unevictable:0 dirty:10 writeback:0 unstable:0 [132031.321091] slab_reclaimable:4834 slab_unreclaimable:137783 [132031.321091] mapped:1028 shmem:4549 pagetables:1120 bounce:0 [132031.321091] free:12862 free_pcp:30 free_cma:0 [132031.326525] Node 0 DMA free:6916kB min:416kB low:520kB high:624kB active_anon:68kB inactive_anon:100kB active_file:0kB inactive_file:24kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:132kB slab_reclaimable:152kB slab_unreclaimable:1516kB kernel_stack:192kB pagetables:104kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:297 all_unreclaimable? yes [132031.333546] lowmem_reserve[]: 0 1660 1660 1660 [132031.334501] Node 0 DMA32 free:44532kB min:44636kB low:55792kB high:66952kB active_anon:12288kB inactive_anon:12456kB active_file:1372kB inactive_file:8404kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2080744kB managed:1703244kB mlocked:0kB dirty:40kB writeback:0kB mapped:4112kB shmem:18064kB slab_reclaimable:19184kB slab_unreclaimable:549616kB kernel_stack:12144kB pagetables:4376kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB writeback_tmp:0kB pages_scanned:51805 all_unreclaimable? yes [132031.342090] lowmem_reserve[]: 0 0 0 0 [132031.342912] Node 0 DMA: 41*4kB (UEM) 26*8kB (UEM) 37*16kB (UEM) 20*32kB (UEM) 13*64kB (UEM) 7*128kB (UEM) 2*256kB (E) 2*512kB (UE) 2*1024kB (U) 0*2048kB 0*4096kB = 6916kB [132031.346179] Node 0 DMA32: 675*4kB (UEM) 2708*8kB (UEM) 1138*16kB (UM) 58*32kB (UM) 2*64kB (UM) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 44556kB [132031.349119] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [132031.350593] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [132031.352019] 7419 total pagecache pages [132031.352692] 404 pages in swap cache [132031.353308] Swap cache stats: add 29767, delete 29363, find 2416/3651 [132031.354391] Free swap = 3558652kB [132031.355004] Total swap = 3671036kB [132031.355606] 524184 pages RAM [132031.356125] 0 pages HighMem/MovableOnly [132031.356798] 94396 pages reserved [132031.357361] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [132031.358680] [ 352] 0 352 9863 827 24 58 0 systemd-journal [132031.360168] [ 393] 0 393 47594 0 28 123 0 lvmetad [132031.361531] [ 400] 0 400 11421 2 25 173 -1000 systemd-udevd [132031.362986] [ 666] 0 666 13883 1 27 111 -1000 auditd [132031.364325] [ 689] 999 689 153058 0 61 1385 0 polkitd [132031.365712] [ 691] 0 691 5385 18 16 42 0 irqbalance [132031.367113] [ 693] 81 693 14594 88 34 120 -900 dbus-daemon [132031.368531] [ 695] 32 695 17314 14 37 126 0 rpcbind [132031.369896] [ 698] 0 698 6637 1 17 117 0 systemd-logind [132031.371348] [ 701] 0 701 155458 91 89 453 0 NetworkManager [132031.372827] [ 702] 0 702 51907 117 35 69 0 zed [132031.374140] [ 714] 0 714 48801 0 34 129 0 gssproxy [132031.375517] [ 721] 38 721 11826 38 28 145 0 ntpd [132031.376860] [ 747] 0 747 25735 7 54 508 0 dhclient [132031.378240] [ 1004] 0 1004 28234 3 57 256 -1000 sshd [132031.379551] [ 1005] 0 1005 143570 145 99 2728 0 tuned [132031.380886] [ 1013] 0 1013 74581 703 73 830 0 rsyslogd [132031.382262] [ 1015] 0 1015 6792 0 18 63 0 xinetd [132031.383609] [ 1018] 997 1018 56473 46 22 131 0 munged [132031.384971] [ 1019] 29 1019 10610 0 25 208 0 rpc.statd [132031.386353] [ 1083] 0 1083 172961 551 80 518 0 automount [132031.387744] [ 1086] 0 1086 31595 25 19 129 0 crond [132031.389075] [ 1089] 0 1089 6477 0 18 52 0 atd [132031.390383] [ 1099] 0 1099 27551 1 10 32 0 agetty [132031.391736] [ 1100] 0 1100 27551 1 13 32 0 agetty [132031.393078] [ 1438] 0 1438 22436 20 42 241 0 master [132031.394416] [ 1449] 89 1449 22479 15 46 238 0 qmgr [132031.395739] [23709] 89 23709 22462 103 44 194 0 pickup [132031.397438] Kernel panic - not syncing: Out of memory: system-wide panic_on_oom is enabled [132031.397438] [132031.399072] CPU: 1 PID: 23709 Comm: pickup Kdump: loaded Tainted: P OE ------------ 3.10.0-1160.2.1.el7_lustre.x86_64 #1 [132031.401036] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [132031.401994] Call Trace: [132031.402435] [<ffffffff947813c0>] dump_stack+0x19/0x1b [132031.403303] [<ffffffff9477a918>] panic+0xe8/0x21f [132031.404121] [<ffffffff941c22a5>] check_panic_on_oom+0x55/0x60 [132031.405099] [<ffffffff941c25f4>] out_of_memory+0x194/0x500 [132031.406041] [<ffffffff9477c7fd>] __alloc_pages_slowpath+0x5db/0x729 [132031.407118] [<ffffffff941c8d76>] __alloc_pages_nodemask+0x436/0x450 [132031.408184] [<ffffffff9421c269>] alloc_pages_vma+0xa9/0x200 [132031.409141] [<ffffffff94209805>] __read_swap_cache_async+0x115/0x190 [132031.410223] [<ffffffff941bd45e>] ? __find_get_page+0x1e/0xa0 [132031.411192] [<ffffffff942098a6>] read_swap_cache_async+0x26/0x60 [132031.412219] [<ffffffff94209a8b>] swapin_readahead+0x1ab/0x210 [132031.413202] [<ffffffff9438d222>] ? radix_tree_lookup_slot+0x22/0x50 [132031.414267] [<ffffffff941bd45e>] ? __find_get_page+0x1e/0xa0 [132031.415235] [<ffffffff941f1f1f>] do_swap_page+0x23f/0x7c0 [132031.416172] [<ffffffff941bdf6b>] ? unlock_page+0x2b/0x30 [132031.417082] [<ffffffff941ee4d9>] ? do_read_fault.isra.63+0x139/0x1b0 [132031.418165] [<ffffffff941f5cb7>] handle_mm_fault+0xaa7/0xfb0 [132031.419136] [<ffffffff9478e653>] __do_page_fault+0x213/0x500 [132031.420101] [<ffffffff9478ea26>] trace_do_page_fault+0x56/0x150 [132031.421117] [<ffffffff9478dfa2>] do_async_page_fault+0x22/0xf0 [132031.422111] [<ffffffff9478a7a8>] async_page_fault+0x28/0x30
Attachments
Issue Links
- mentioned in
-
Page Loading...