Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
None
-
lola
build: 2.8.50-6-gf9ca359 ; commit f9ca359284357d145819beb08b316e932f7a3060
-
3
-
9223372036854775807
Description
Error happened during soak testing of build '20160215' (see https://wiki.hpdd.intel.com/display/Releases/Soak+Testing+on+Lola#SoakTestingonLola-20150215). DNE is enabled.
MDT had been formatted using ldiskfs, OSTs using zfs. MDS nodes are configured in active-active HA failover configuration.
Please note that build 20150215 is a vanilla build of the master brunch.
This issue might be addressed by the changes included in build '20160210' as we didn't observe this issue in a two day test session.
- 2016-02-15 15:37:51,169:fsmgmt.fsmgmt:INFO triggering fault mds_failover (for lola-11)
- 2016-02-15 15:44:57,839:fsmgmt.fsmgmt:INFO mds_failover just completed (for lola-11)
- After that the slabs memory consumption of slabs continuously increased till all resources are exhausted at 2016-02-15 22:38.
- Most pages are allocated by size-1048576 slabs. High score list reads as
#Date Time SlabName ObjInUse ObjInUseB ObjAll ObjAllB SlabInUse SlabInUseB SlabAll SlabAllB SlabChg SlabPct 20160215 22:46:20 size-1048576 29147 30562844672 29147 30562844672 29147 30562844672 29147 30562844672 0 0 20160215 22:46:20 size-262144 1793 470024192 1793 470024192 1793 470024192 1793 470024192 0 0 20160215 22:46:20 ptlrpc_cache 399364 306711552 399380 306723840 79873 327159808 79876 327172096 180224 0 20160215 22:46:20 size-1024 229179 234679296 229188 234688512 57295 234680320 57297 234688512 -24576 0 20160215 22:46:20 size-512 256540 131348480 258232 132214784 32278 132210688 32279 132214784 86016 0 20160215 22:46:20 size-192 460848 88482816 460880 88488960 23043 94384128 23044 94388224 28672 0 20160215 22:46:20 size-8192 5776 47316992 5776 47316992 5776 47316992 5776 47316992 -8192 0 20160215 22:46:20 size-128 265120 33935360 266250 34080000 8875 36352000 8875 36352000 0 0 20160215 22:46:20 size-65536 361 23658496 361 23658496 361 23658496 361 23658496 0 0 20160215 22:46:20 kmem_cache 289 9506944 289 9506944 289 18939904 289 18939904 0 0
(see attached file slab-usage-by-allocation-descending.dat)
Attached files messages, console file of lola-11. Sorted slab usage as oom-killer was started.
Attachments
Issue Links
- is related to
-
LU-7836 MDSes crashed with oom-killer
- Resolved