Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.5.0, Lustre 2.4.3
-
Clients:
Endeavour: 2.4.3, ldan: 2.4.1, Pleiades compute nodes: 2.1.5 or 2.4.1
Servers:
2.1.5, 2.4.1, 2.4.3
-
2
-
14374
Description
We have been seeing our SLES11SP2 and SLES11SP3 clients have stuck anonymous memory that cannot be cleared up without a reboot. We have three test cases which can replicate the problem reliably. We have been able to replicate the problem on different clients on all of our lustre file systems. We have not been able to reproduce the problem when using NFS, ext3, CXFS, or tmpfs.
We have been working with SGI on tracking down this problem. Unfortunately, they have been unable to reproduce the problem on their systems. On our systems, they have simplified the test case to mmaping a file along with an equally sized anonymous region, and reading the contents of the mmaped file into the anonymous mmaped region. This test case can be provided to see if you can reproduce this problem.
To determine if the problem is occurring, reboot the system to ensure that memory is clean. Check /proc/meminfo for the amount of Active(anon) memory being used. Run the test case. During the test case, the amount of anonymous memory will increase. At the end of the test case, it would be expected for the amount to drop back to pre-test case levels.
To confirm that the anonymous memory is stuck, we have been using memhog to attempt to allocate memory. If the node has 32Gb of memory, with 2Gb of anonymous memory used, we attempt to allocate 31Gb of memory. If memhog completes and you then have only 1Gb of anonymous memory, you have not reproduced the problem. If memhog is killed, you have.
SGI would like to get information about how to get debug information to track down this problem.
Below is an exmaple of a system that stuck in this memory. The system has 4TB memry and 1.5TB stuck in Acitve(anon) that can not be released. There are 126 nodes in that systems and application would request a number of nodes for their testing. After the memory leak, those nodes would have not enough memory for other jobs. The application would fail and resubmision of the job would then fail to start because the requested notes did not have enough memory.
MemTotal: 4036524872 kB
MemFree: 2399583516 kB
Buffers: 243504 kB
Cached: 5204560 kB
SwapCached: 678520 kB
Active: 1544908812 kB
Inactive: 56619188 kB
Active(anon): 1543105636 kB
Inactive(anon): 53018772 kB
Active(file): 1803176 kB
Inactive(file): 3600416 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 10239996 kB
SwapFree: 0 kB
Dirty: 554504 kB
Writeback: 26128 kB
AnonPages: 1595359296 kB
Mapped: 143708 kB
Shmem: 98772 kB
Slab: 11485844 kB
SReclaimable: 161660 kB
SUnreclaim: 11324184 kB
KernelStack: 87016 kB
PageTables: 6747560 kB
NFS_Unstable: 31856 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 2027403680 kB
Committed_AS: 1262704572 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 18824052 kB
VmallocChunk: 25954600480 kB
HardwareCorrupted: 0 kB
AnonHugePages: 1264787456 kB
HugePages_Total: 1073
HugePages_Free: 468
HugePages_Rsvd: 468
HugePages_Surp: 1073
Hugepagesize: 2048 kB
DirectMap4k: 335872 kB
DirectMap2M: 134963200 kB
DirectMap1G: 3958374400 kB