Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.5.0
-
Config: Single-node client+MDS+OSS with 1 MDT, 3 OSTs
Node: x86_64 w/ dual-core CPU, 2GB RAM
Kernel: 2.6.32-279.5.1.el6_lustre.g7f15218.x86_64
Lustre build: 72afa19c19d5ac
-
3
-
10870
Description
I'm trying to determine if there is a "memory leak" in the current Lustre code that can affect long-running clients or servers. While this memory may be cleaned up when the filesystem is unmounted, it does not appear to be cleaned up under steady-state usage.
I started "rundbench 10 -t 3600" and am watching the memory usage in several forms (slabtop, vmstat, "lfs df", "lfs df -i"). It does indeed appear that there are a number of statistics that show what looks to be a memory leak. These statistics are gathered at about the same time, but not exactly at the same time. The general trend is fairly clear, however:
The "lfs df -i" output shows only around 1000 in-use files during the whole run:
UUID Inodes IUsed IFree IUse% Mounted on testfs-MDT0000_UUID 524288 1024 523264 0% /mnt/testfs[MDT:0] testfs-OST0000_UUID 131072 571 130501 0% /mnt/testfs[OST:0] testfs-OST0001_UUID 131072 562 130510 0% /mnt/testfs[OST:1] testfs-OST0002_UUID 131072 576 130496 0% /mnt/testfs[OST:2] filesystem summary: 524288 1024 523264 0% /mnt/testfs
The LDLM resource_count shows the number of locks, slightly less than 50k, but a lot more than the number of actual objects in the filesystem:
# lctl get_param ldlm.namespaces.*.resource_count ldlm.namespaces.filter-testfs-OST0000_UUID.resource_count=238 ldlm.namespaces.filter-testfs-OST0001_UUID.resource_count=226 ldlm.namespaces.filter-testfs-OST0002_UUID.resource_count=237 ldlm.namespaces.mdt-testfs-MDT0000_UUID.resource_count=49161 ldlm.namespaces.testfs-MDT0000-mdc-ffff8800a66c1c00.resource_count=49160 ldlm.namespaces.testfs-OST0000-osc-ffff8800a66c1c00.resource_count=237 ldlm.namespaces.testfs-OST0001-osc-ffff8800a66c1c00.resource_count=226 ldlm.namespaces.testfs-OST0002-osc-ffff8800a66c1c00.resource_count=236
Total memory used (as shown by "vmstat") also shows a steady increase over time, originally 914116kB of free memory, down to 202036kB after about 3000s of the run so far (about 700MB of memory used), and eventually ends up at 86724kB at the end of the run (830MB used). While that would be normal with a workload that is accessing a large number of files that are kept in cache, the total amount of used space in the filesystem is steadily about 240MB during the entire run.
The "slabtop" output (edited to remove uninteresting slabs) shows over 150k and steadily growing number of allocated structures for CLIO, far more than could actually be in use at any given time. All of the CLIO slabs are 100% used, so it isn't just a matter of alloc/free causing partially-used slabs.
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 242660 242660 100% 0.19K 12133 20 48532K size-192 217260 217260 100% 0.19K 10863 20 43452K dentry 203463 178864 87% 0.10K 5499 37 21996K buffer_head 182000 181972 99% 0.03K 1625 112 6500K size-32 181530 181530 100% 0.12K 6051 30 24204K size-128 156918 156918 100% 1.25K 52306 3 209224K lustre_inode_cache 156840 156840 100% 0.12K 5228 30 20912K lov_oinfo 156825 156825 100% 0.22K 9225 17 36900K lov_object_kmem 156825 156825 100% 0.22K 9225 17 36900K lovsub_object_kmem 156816 156816 100% 0.24K 9801 16 39204K ccc_object_kmem 156814 156814 100% 0.27K 11201 14 44804K osc_object_kmem 123832 121832 98% 0.50K 15479 8 61916K size-512 98210 92250 93% 0.50K 14030 7 56120K ldlm_locks 97460 91009 93% 0.38K 9746 10 38984K ldlm_resources 76320 76320 100% 0.08K 1590 48 6360K mdd_obj 76262 76262 100% 0.11K 2243 34 8972K lod_obj 76245 76245 100% 0.28K 5865 13 23460K mdt_obj 2865 2764 96% 1.03K 955 3 3820K ldiskfs_inode_cache 1746 1546 88% 0.21K 97 18 388K cl_lock_kmem 1396 1396 100% 1.00K 349 4 1396K ptlrpc_cache 1345 1008 74% 0.78K 269 5 1076K shmem_inode_cache 1298 847 65% 0.06K 22 59 88K lovsub_lock_kmem 1224 898 73% 0.16K 51 24 204K ofd_obj 1008 794 78% 0.18K 48 21 192K osc_lock_kmem 1008 783 77% 0.03K 9 112 36K lov_lock_link_kmem 925 782 84% 0.10K 25 37 100K lov_lock_kmem 920 785 85% 0.04K 10 92 40K ccc_lock_kmem
The ldiskfs_inode_cache shows a reasonable number of objects in use, one for each MDT and OST inode actually in use. It might be that this is a leak of unlinked inodes/dentries on the client?
Now, after 3600s of running, the dbench has finished and deleted all of the files:
Operation Count AvgLat MaxLat ---------------------------------------- NTCreateX 1229310 5.896 1056.405 Close 903051 2.960 1499.813 Rename 52083 8.024 827.129 Unlink 248209 3.694 789.403 Deltree 20 119.498 421.063 Mkdir 10 0.050 0.155 Qpathinfo 1114775 2.129 953.086 Qfileinfo 195028 0.114 25.925 Qfsinfo 204279 0.574 32.902 Sfileinfo 100238 27.316 1442.888 Find 430819 6.750 1369.539 WriteX 611079 0.833 857.679 ReadX 1927390 0.107 1171.947 LockX 4004 0.005 1.899 UnlockX 4004 0.003 3.345 Flush 86164 183.254 2577.019 Throughput 10.6947 MB/sec 10 clients 10 procs max_latency=2577.028 ms
The slabs still show a large number of allocations, even though no files exist in the filesystem anymore:
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 289880 133498 46% 0.19K 14494 20 57976K size-192 278768 274718 98% 0.03K 2489 112 9956K size-32 274410 259726 94% 0.12K 9147 30 36588K size-128 253590 250634 98% 0.12K 8453 30 33812K lov_oinfo 253555 250634 98% 0.22K 14915 17 59660K lovsub_object_kmem 253552 250634 98% 0.24K 15847 16 63388K ccc_object_kmem 253540 250634 98% 0.27K 18110 14 72440K osc_object_kmem 253538 250634 98% 0.22K 14914 17 59656K lov_object_kmem 252330 250638 99% 1.25K 84110 3 336440K lustre_inode_cache 203463 179392 88% 0.10K 5499 37 21996K buffer_head 128894 128446 99% 0.11K 3791 34 15164K lod_obj 128880 128446 99% 0.08K 2685 48 10740K mdd_obj 128869 128446 99% 0.28K 9913 13 39652K mdt_obj 84574 79368 93% 0.50K 12082 7 48328K ldlm_locks 82660 79314 95% 0.38K 8266 10 33064K ldlm_resources 71780 50308 70% 0.19K 3589 20 14356K dentry
There are also still about 40k MDT locks, though all of the OST locks are gone (which is expected if these files are unlinked).
# lctl get_param ldlm.namespaces.*.resource_count ldlm.namespaces.filter-testfs-OST0000_UUID.resource_count=0 ldlm.namespaces.filter-testfs-OST0001_UUID.resource_count=0 ldlm.namespaces.filter-testfs-OST0002_UUID.resource_count=0 ldlm.namespaces.mdt-testfs-MDT0000_UUID.resource_count=39654 ldlm.namespaces.testfs-MDT0000-mdc-ffff8800a66c1c00.resource_count=39654 ldlm.namespaces.testfs-OST0000-osc-ffff8800a66c1c00.resource_count=0 ldlm.namespaces.testfs-OST0001-osc-ffff8800a66c1c00.resource_count=0 ldlm.namespaces.testfs-OST0002-osc-ffff8800a66c1c00.resource_count=0
Attachments
Issue Links
- duplicates
-
LU-3771 stuck 56G of SUnreclaim memory
- Resolved
- is duplicated by
-
LU-4754 MDS large amount of slab usage
- Resolved
- is related to
-
LU-4033 Failure on test suite parallel-scale-nfsv4 test_iorssf: MDS oom
- Resolved
-
LU-4740 MDS - buffer cache not freed
- Resolved
-
LU-3997 Excessive slab usage causes large mem & core count clients to hang
- Resolved
-
LU-4429 clients leaking open handles/bad lock matching in ll_md_blocking_ast
- Resolved
-
LU-4754 MDS large amount of slab usage
- Resolved
-
LU-4002 HSM restore vs unlink deadlock
- Resolved
- is related to
-
LU-4357 page allocation failure. mode:0x40 caused by missing __GFP_WAIT flag
- Resolved
-
LU-2487 2.2 Client deadlock between ll_md_blocking_ast, sys_close, and sys_open
- Resolved