Details
-
Bug
-
Resolution: Won't Fix
-
Blocker
-
None
-
None
-
lola:
build: tip of master + #31 of change 16383
-
3
-
9223372036854775807
Description
The error below happens during soak testing of change 16838 patch set #31 (no Wiki entry for build exits, yet) on cluster lola. DNE is enabled and MDSes are configured in active-active HA failover configuration.
Primary resources of MDT lola-11 were failed back at Dec, 3 20:18.
The allocation of slabs increased continuously till ~ 31 GB till crash
MDS node lola-11 crashed with oom-killer at Dec, 4 00:21 (local time). (see also LU-7432)
ptlrpc_cache seems to be the biggest consumer
Attached lola-11's messages, console log, vmcore-dmesg file, collectl (version V4.0.2-1) files (for time interval specified above). Also
attached files containing extracted counters for memory, slab totals and per slab allocation.