Details
-
Bug
-
Resolution: Not a Bug
-
Minor
-
None
-
None
-
None
-
3
-
6768
Description
At our center, we are running a Lustre 2.1.2 file system with Lustre 2.1.2 clients on all of the compute nodes of our Penguin cluster. Recently, a user has been performing WRF runs where he uses a special feature of WRF to offload all of the I/O onto a single node, which improves his I/O performance dramatically, but results in the node losing ~1 GB of memory to "Inactive" after each run. In our epilogue, we have a script checking for available free memory above a specified percentage, and every job that this user runs results in the node being set to offline due to this 1 GB of Inactive memory.
Here is an example of the output from drop_caches showing before and after the epilogue starts on one of these nodes:
- Before:
- MemTotal: 15.681 GB
- MemFree: 6.495 GB
- Cached: 6.206 GB
- Active: 1.395 GB
- Inactive: 6.247 GB
- Dirty: 0.000 GB
- Mapped: 0.003 GB
- Slab: 1.391 GB
- After:
- MemTotal: 15.681 GB
- MemFree: 14.003 GB
- Cached: 0.007 GB
- Active: 0.134 GB
- Inactive: 1.309 GB
- Dirty: 0.000 GB
- Mapped: 0.003 GB
- Slab: 0.082 GB
While looking for possible solutions to this problem, I stumbled upon a recent HPDD-Discuss question that was entitled "Possible file page leak in Lustre 2.1.2" which was very similar to our problem. It was suggested that the issue had already been discovered and resolved in http://jira.whamcloud.com/browse/LU-1576
This ticket suggests that the resolution was included as part of Lustre 2.1.3, so we tested this by installing the Lustre 2.1.3 client packages on some of our compute nodes and allowing the WRF job to run on these nodes. However, even after the upgrade to Lustre 2.1.3, we still saw the inactive memory at the end of the job. Do we need to upgrade our Lustre installation on the OSSes and MDS to Lustre 2.1.3 to fix this problem, or do you have any other suggestions?
Any help that you could provide us with would be appreciated!
Attachments
Activity
So, just to reconfirm, when you run this app several times on the same client, it adds 1G more of inactive data, right, so eventually the node will die with OOM?
If you unmount lustre fs after the run on this client and then mount it back instead of the reboot, is the memory reclaimed?
I would nto put too much into the leaks you see reported as those are useless unless you take the reading after unmount since every bit of memory allocated but not yet freed because it is in use will show as leaked.
Thank you, I am engaging further engineering resources now.
NOTE: The debug log is to large to attach to this case. Here is a link instead.
https://www.dropbox.com/s/vwuklioioytcl7e/lustre_debug
Thanks
Customer ran there WRF job with the Lustre debugging set to gather malloc information, and it does appear that we have found a leak in Lustre. Here were the steps we followed:
1) sudo lctl set_param debug=+malloc
2) sudo lctl set_param debug_mb=512
3) * let the WRF job run *
Epilogue sets the node offline (1.62 GB of memory set to inactive)
4) sudo lctl dk /tmp/lustre_debug
5) perl leak_finder.pl /tmp/lustre_debug 2>&1 | grep "Leak"
From that last command, here is what we found:
-
-
- Leak: 1080 bytes allocated at ffff8101d5eae140 (super25.c:ll_alloc_inode:56, debug file line 1745506)
- Leak: 104 bytes allocated at ffff8101cecdbdc0 (dcache.c:ll_set_dd:192, debug file line 1745508)
- Leak: 1080 bytes allocated at ffff810214eb3ac0 (super25.c:ll_alloc_inode:56, debug file line 1745551)
- Leak: 104 bytes allocated at ffff8101d7523840 (dcache.c:ll_set_dd:192, debug file line 1745553)
-
The Lustre documentation states that this is a cyclical log so I would imagine that if there was a small leak shown here, that throughout the run small amounts of memory could have been lost, resulting in our overall large leak.
We will attach the lustre_debug log to this case for you to analyze as well. It does look as though we may be narrowing down on the problem now though.
Additionally, I am going to attach the /proc/slabinfo for the end of the
WRF run as you had previously requested, along with the /proc/meminfo
before, during, and after the WRF run.
cat /proc/slabinfo
slabinfo - version: 2.1
- name <active_objs> <num_objs> <objsize> <objperslab>
<pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata
<active_slabs> <num_slabs> <sharedavail>
ll_qunit_cache 0 0 112 34 1 : tunables 120 60 8
: slabdata 0 0 0
lmv_objects 0 0 96 40 1 : tunables 120 60 8
: slabdata 0 0 0
ccc_req_kmem 0 0 40 92 1 : tunables 120 60 8
: slabdata 0 0 0
ccc_session_kmem 105 132 176 22 1 : tunables 120 60 8
: slabdata 6 6 0
ccc_thread_kmem 112 121 336 11 1 : tunables 54 27 8
: slabdata 11 11 0
ccc_object_kmem 0 0 256 15 1 : tunables 120 60 8
: slabdata 0 0 0
ccc_lock_kmem 0 0 40 92 1 : tunables 120 60 8
: slabdata 0 0 0
vvp_session_kmem 105 148 104 37 1 : tunables 120 60 8
: slabdata 4 4 0
vvp_thread_kmem 112 126 440 9 1 : tunables 54 27 8
: slabdata 14 14 0
vvp_page_kmem 0 0 80 48 1 : tunables 120 60 8
: slabdata 0 0 0
ll_rmtperm_hash_cache 0 0 256 15 1 : tunables 120 60
8 : slabdata 0 0 0
ll_remote_perm_cache 0 0 40 92 1 : tunables 120 60
8 : slabdata 0 0 0
ll_file_data 0 0 192 20 1 : tunables 120 60 8
: slabdata 0 0 0
lustre_inode_cache 3 21 1088 7 2 : tunables 24 12 8
: slabdata 3 3 0
lov_oinfo 0 0 320 12 1 : tunables 54 27 8
: slabdata 0 0 0
lov_lock_link_kmem 0 0 32 112 1 : tunables 120 60 8
: slabdata 0 0 0
lovsub_req_kmem 0 0 40 92 1 : tunables 120 60 8
: slabdata 0 0 0
lovsub_object_kmem 0 0 240 16 1 : tunables 120 60 8
: slabdata 0 0 0
lovsub_lock_kmem 0 0 64 59 1 : tunables 120 60 8
: slabdata 0 0 0
lovsub_page_kmem 0 0 40 92 1 : tunables 120 60 8
: slabdata 0 0 0
lov_req_kmem 0 0 40 92 1 : tunables 120 60 8
: slabdata 0 0 0
lov_session_kmem 105 120 384 10 1 : tunables 54 27 8
: slabdata 12 12 0
lov_thread_kmem 112 121 336 11 1 : tunables 54 27 8
: slabdata 11 11 0
lov_object_kmem 0 0 200 19 1 : tunables 120 60 8
: slabdata 0 0 0
lov_lock_kmem 0 0 104 37 1 : tunables 120 60 8
: slabdata 0 0 0
lov_page_kmem 0 0 48 77 1 : tunables 120 60 8
: slabdata 0 0 0
osc_req_kmem 0 0 40 92 1 : tunables 120 60 8
: slabdata 0 0 0
osc_session_kmem 105 130 296 13 1 : tunables 54 27 8
: slabdata 10 10 0
osc_thread_kmem 112 126 216 18 1 : tunables 120 60 8
: slabdata 7 7 0
osc_object_kmem 0 0 136 28 1 : tunables 120 60 8
: slabdata 0 0 0
osc_lock_kmem 0 0 184 21 1 : tunables 120 60 8
: slabdata 0 0 0
osc_page_kmem 0 0 264 15 1 : tunables 54 27 8
: slabdata 0 0 0
llcd_cache 0 0 3952 1 1 : tunables 24 12 8
: slabdata 0 0 0
interval_node 22 90 128 30 1 : tunables 120 60 8
: slabdata 3 3 0
ldlm_locks 43 63 576 7 1 : tunables 54 27 8
: slabdata 9 9 0
ldlm_resources 41 72 320 12 1 : tunables 54 27 8
: slabdata 6 6 0
cl_page_kmem 0 0 184 21 1 : tunables 120 60 8
: slabdata 0 0 0
cl_lock_kmem 0 0 216 18 1 : tunables 120 60 8
: slabdata 0 0 0
cl_env_kmem 105 132 176 22 1 : tunables 120 60 8
: slabdata 6 6 0
capa_cache 0 0 184 21 1 : tunables 120 60 8
: slabdata 0 0 0
ll_import_cache 0 0 1424 5 2 : tunables 24 12 8
: slabdata 0 0 0
ll_obdo_cache 0 0 208 19 1 : tunables 120 60 8
: slabdata 0 0 0
ll_obd_dev_cache 17 17 7048 1 2 : tunables 8 4 0
: slabdata 17 17 0
SDP 0 0 1792 2 1 : tunables 24 12 8
: slabdata 0 0 0
fib6_nodes 7 118 64 59 1 : tunables 120 60 8
: slabdata 2 2 0
ip6_dst_cache 7 36 320 12 1 : tunables 54 27 8
: slabdata 3 3 0
ndisc_cache 1 15 256 15 1 : tunables 120 60 8
: slabdata 1 1 0
RAWv6 11 12 960 4 1 : tunables 54 27 8
: slabdata 3 3 0
UDPv6 0 0 896 4 1 : tunables 54 27 8
: slabdata 0 0 0
tw_sock_TCPv6 0 0 192 20 1 : tunables 120 60 8
: slabdata 0 0 0
request_sock_TCPv6 0 0 192 20 1 : tunables 120 60 8
: slabdata 0 0 0
TCPv6 0 0 1728 4 2 : tunables 24 12 8
: slabdata 0 0 0
nfs_direct_cache 0 0 136 28 1 : tunables 120 60 8
: slabdata 0 0 0
nfs_write_data 36 36 832 9 2 : tunables 54 27 8
: slabdata 4 4 0
nfs_read_data 32 36 832 9 2 : tunables 54 27 8
: slabdata 4 4 0
nfs_inode_cache 123 195 1032 3 1 : tunables 24 12 8
: slabdata 65 65 0
nfs_page 0 0 128 30 1 : tunables 120 60 8
: slabdata 0 0 0
rpc_buffers 8 8 2048 2 1 : tunables 24 12 8
: slabdata 4 4 0
rpc_tasks 20 20 384 10 1 : tunables 54 27 8
: slabdata 2 2 0
rpc_inode_cache 30 30 768 5 1 : tunables 54 27 8
: slabdata 6 6 0
scsi_cmd_cache 5 10 384 10 1 : tunables 54 27 8
: slabdata 1 1 2
sgpool-128 32 32 4096 1 1 : tunables 24 12 8
: slabdata 32 32 0
sgpool-64 32 32 2048 2 1 : tunables 24 12 8
: slabdata 16 16 0
sgpool-32 32 32 1024 4 1 : tunables 54 27 8
: slabdata 8 8 0
sgpool-16 32 32 512 8 1 : tunables 54 27 8
: slabdata 4 4 0
sgpool-8 32 60 256 15 1 : tunables 120 60 8
: slabdata 3 4 0
scsi_io_context 0 0 112 34 1 : tunables 120 60 8
: slabdata 0 0 0
ib_mad 2048 2296 448 8 1 : tunables 54 27 8
: slabdata 287 287 0
ip_fib_alias 14 59 64 59 1 : tunables 120 60 8
: slabdata 1 1 0
ip_fib_hash 14 59 64 59 1 : tunables 120 60 8
: slabdata 1 1 0
UNIX 9 33 704 11 2 : tunables 54 27 8
: slabdata 3 3 0
flow_cache 0 0 128 30 1 : tunables 120 60 8
: slabdata 0 0 0
msi_cache 9 59 64 59 1 : tunables 120 60 8
: slabdata 1 1 0
cfq_ioc_pool 13 60 128 30 1 : tunables 120 60 8
: slabdata 2 2 0
cfq_pool 11 54 216 18 1 : tunables 120 60 8
: slabdata 3 3 0
crq_pool 4 96 80 48 1 : tunables 120 60 8
: slabdata 1 2 0
deadline_drq 0 0 80 48 1 : tunables 120 60 8
: slabdata 0 0 0
as_arq 0 0 96 40 1 : tunables 120 60 8
: slabdata 0 0 0
mqueue_inode_cache 1 4 896 4 1 : tunables 54 27 8
: slabdata 1 1 0
isofs_inode_cache 0 0 608 6 1 : tunables 54 27 8
: slabdata 0 0 0
hugetlbfs_inode_cache 1 7 576 7 1 : tunables 54 27
8 : slabdata 1 1 0
ext2_inode_cache 91 145 720 5 1 : tunables 54 27 8
: slabdata 29 29 0
ext2_xattr 0 0 88 44 1 : tunables 120 60 8
: slabdata 0 0 0
dnotify_cache 0 0 40 92 1 : tunables 120 60 8
: slabdata 0 0 0
dquot 0 0 256 15 1 : tunables 120 60 8
: slabdata 0 0 0
eventpoll_pwq 5 106 72 53 1 : tunables 120 60 8
: slabdata 2 2 0
eventpoll_epi 5 40 192 20 1 : tunables 120 60 8
: slabdata 2 2 0
inotify_event_cache 0 0 40 92 1 : tunables 120 60
8 : slabdata 0 0 0
inotify_watch_cache 0 0 72 53 1 : tunables 120 60
8 : slabdata 0 0 0
kioctx 0 0 320 12 1 : tunables 54 27 8
: slabdata 0 0 0
kiocb 0 0 256 15 1 : tunables 120 60 8
: slabdata 0 0 0
fasync_cache 0 0 24 144 1 : tunables 120 60 8
: slabdata 0 0 0
shmem_inode_cache 1360 1370 768 5 1 : tunables 54 27 8
: slabdata 274 274 0
posix_timers_cache 0 0 128 30 1 : tunables 120 60 8
: slabdata 0 0 0
uid_cache 2 30 128 30 1 : tunables 120 60 8
: slabdata 1 1 0
ip_mrt_cache 0 0 128 30 1 : tunables 120 60 8
: slabdata 0 0 0
tcp_bind_bucket 28 448 32 112 1 : tunables 120 60 8
: slabdata 4 4 0
inet_peer_cache 0 0 128 30 1 : tunables 120 60 8
: slabdata 0 0 0
secpath_cache 0 0 64 59 1 : tunables 120 60 8
: slabdata 0 0 0
xfrm_dst_cache 0 0 384 10 1 : tunables 54 27 8
: slabdata 0 0 0
ip_dst_cache 107 180 384 10 1 : tunables 54 27 8
: slabdata 18 18 0
arp_cache 53 75 256 15 1 : tunables 120 60 8
: slabdata 5 5 0
RAW 9 10 768 5 1 : tunables 54 27 8
: slabdata 2 2 0
UDP 10 15 768 5 1 : tunables 54 27 8
: slabdata 3 3 0
tw_sock_TCP 20 40 192 20 1 : tunables 120 60 8
: slabdata 1 2 0
request_sock_TCP 0 0 128 30 1 : tunables 120 60 8
: slabdata 0 0 0
TCP 32 35 1600 5 2 : tunables 24 12 8
: slabdata 7 7 0
blkdev_ioc 13 118 64 59 1 : tunables 120 60 8
: slabdata 2 2 0
blkdev_queue 17 20 1576 5 2 : tunables 24 12 8
: slabdata 4 4 0
blkdev_requests 7 14 272 14 1 : tunables 54 27 8
: slabdata 1 1 2
biovec-256 7 7 4096 1 1 : tunables 24 12 8
: slabdata 7 7 0
biovec-128 7 8 2048 2 1 : tunables 24 12 8
: slabdata 4 4 0
biovec-64 7 8 1024 4 1 : tunables 54 27 8
: slabdata 2 2 0
biovec-16 7 30 256 15 1 : tunables 120 60 8
: slabdata 2 2 0
biovec-4 7 118 64 59 1 : tunables 120 60 8
: slabdata 2 2 0
biovec-1 7 404 16 202 1 : tunables 120 60 8
: slabdata 2 2 0
bio 262 300 128 30 1 : tunables 120 60 8
: slabdata 10 10 2
utrace_engine_cache 0 0 64 59 1 : tunables 120 60
8 : slabdata 0 0 0
utrace_cache 0 0 64 59 1 : tunables 120 60 8
: slabdata 0 0 0
sock_inode_cache 90 108 640 6 1 : tunables 54 27 8
: slabdata 18 18 0
skbuff_fclone_cache 14 14 512 7 1 : tunables 54 27
8 : slabdata 2 2 0
skbuff_head_cache 2847 3060 256 15 1 : tunables 120 60 8
: slabdata 204 204 0
file_lock_cache 1 22 176 22 1 : tunables 120 60 8
: slabdata 1 1 0
Acpi-Operand 1848 2360 64 59 1 : tunables 120 60 8
: slabdata 40 40 0
Acpi-ParseExt 0 0 64 59 1 : tunables 120 60 8
: slabdata 0 0 0
Acpi-Parse 0 0 40 92 1 : tunables 120 60 8
: slabdata 0 0 0
Acpi-State 0 0 80 48 1 : tunables 120 60 8
: slabdata 0 0 0
Acpi-Namespace 839 896 32 112 1 : tunables 120 60 8
: slabdata 8 8 0
delayacct_cache 379 531 64 59 1 : tunables 120 60 8
: slabdata 9 9 0
taskstats_cache 19 53 72 53 1 : tunables 120 60 8
: slabdata 1 1 0
proc_inode_cache 146 180 592 6 1 : tunables 54 27 8
: slabdata 30 30 0
sigqueue 53 96 160 24 1 : tunables 120 60 8
: slabdata 4 4 0
radix_tree_node 9320 15316 536 7 1 : tunables 54 27 8
: slabdata 2188 2188 0
bdev_cache 6 12 832 4 1 : tunables 54 27 8
: slabdata 3 3 0
sysfs_dir_cache 5366 5412 88 44 1 : tunables 120 60 8
: slabdata 123 123 0
mnt_cache 42 60 256 15 1 : tunables 120 60 8
: slabdata 4 4 0
inode_cache 1231 1274 560 7 1 : tunables 54 27 8
: slabdata 182 182 0
dentry_cache 3139 4140 216 18 1 : tunables 120 60 8
: slabdata 230 230 0
filp 200 570 256 15 1 : tunables 120 60 8
: slabdata 38 38 0
names_cache 9 9 4096 1 1 : tunables 24 12 8
: slabdata 9 9 0
avc_node 30 106 72 53 1 : tunables 120 60 8
: slabdata 2 2 0
selinux_inode_security 3124 4032 80 48 1 : tunables 120 60
8 : slabdata 84 84 0
key_jar 4 20 192 20 1 : tunables 120 60 8
: slabdata 1 1 0
idr_layer_cache 199 238 528 7 1 : tunables 54 27 8
: slabdata 34 34 0
buffer_head 148 320 96 40 1 : tunables 120 60 8
: slabdata 8 8 0
mm_struct 24 32 896 4 1 : tunables 54 27 8
: slabdata 8 8 0
vm_area_struct 428 1430 176 22 1 : tunables 120 60 8
: slabdata 65 65 1
fs_cache 50 177 64 59 1 : tunables 120 60 8
: slabdata 3 3 0
files_cache 35 60 768 5 1 : tunables 54 27 8
: slabdata 12 12 0
signal_cache 367 378 832 9 2 : tunables 54 27 8
: slabdata 42 42 0
sighand_cache 357 360 2112 3 2 : tunables 24 12 8
: slabdata 120 120 0
task_struct 368 370 1920 2 1 : tunables 24 12 8
: slabdata 185 185 0
anon_vma 294 1008 24 144 1 : tunables 120 60 8
: slabdata 7 7 0
pid 393 531 64 59 1 : tunables 120 60 8
: slabdata 9 9 0
shared_policy_node 0 0 48 77 1 : tunables 120 60 8
: slabdata 0 0 0
numa_policy 72 432 24 144 1 : tunables 120 60 8
: slabdata 3 3 0
size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0
: slabdata 0 0 0
size-131072 2 2 131072 1 32 : tunables 8 4 0
: slabdata 2 2 0
size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0
: slabdata 0 0 0
size-65536 6 6 65536 1 16 : tunables 8 4 0
: slabdata 6 6 0
size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0
: slabdata 0 0 0
size-32768 7 7 32768 1 8 : tunables 8 4 0
: slabdata 7 7 0
size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0
: slabdata 0 0 0
size-16384 2070 2070 16384 1 4 : tunables 8 4 0
: slabdata 2070 2070 0
size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0
: slabdata 0 0 0
size-8192 2026 2026 8192 1 2 : tunables 8 4 0
: slabdata 2026 2026 0
size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 8
: slabdata 0 0 0
size-4096 911 911 4096 1 1 : tunables 24 12 8
: slabdata 911 911 0
size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 8
: slabdata 0 0 0
size-2048 1080 1120 2048 2 1 : tunables 24 12 8
: slabdata 560 560 83
size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 8
: slabdata 0 0 0
size-1024 1429 1756 1024 4 1 : tunables 54 27 8
: slabdata 439 439 83
size-512(DMA) 0 0 512 8 1 : tunables 54 27 8
: slabdata 0 0 0
size-512 1607 2024 512 8 1 : tunables 54 27 8
: slabdata 253 253 2
size-256(DMA) 0 0 256 15 1 : tunables 120 60 8
: slabdata 0 0 0
size-256 3144 3495 256 15 1 : tunables 120 60 8
: slabdata 233 233 0
size-128(DMA) 0 0 128 30 1 : tunables 120 60 8
: slabdata 0 0 0
size-64(DMA) 0 0 64 59 1 : tunables 120 60 8
: slabdata 0 0 0
size-64 8683 22243 64 59 1 : tunables 120 60 8
: slabdata 377 377 0
size-32(DMA) 0 0 32 112 1 : tunables 120 60 8
: slabdata 0 0 0
size-128 3423 7410 128 30 1 : tunables 120 60 8
: slabdata 247 247 1
size-32 54883 59024 32 112 1 : tunables 120 60 8
: slabdata 527 527 0
kmem_cache 182 182 2688 1 1 : tunables 24 12 8
: slabdata 182 182 0
cat /proc/meminfo <before job begins>
MemTotal: 16442916 kB
MemFree: 15650204 kB
Buffers: 200 kB
Cached: 303428 kB
SwapCached: 0 kB
Active: 331180 kB
Inactive: 206008 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 16442916 kB
LowFree: 15650204 kB
SwapTotal: 4225084 kB
SwapFree: 4225084 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 235168 kB
Mapped: 8452 kB
Slab: 77148 kB
PageTables: 2740 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 12446540 kB
Committed_AS: 857620 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 80412 kB
VmallocChunk: 34359657895 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
cat /proc/meminfo <during the WRF run>
MemTotal: 16442916 kB
MemFree: 360168 kB
Buffers: 160 kB
Cached: 5678292 kB
SwapCached: 2230028 kB
Active: 8199640 kB
Inactive: 6118380 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 16442916 kB
LowFree: 360168 kB
SwapTotal: 4225084 kB
SwapFree: 1557472 kB
Dirty: 9940 kB
Writeback: 7072 kB
AnonPages: 6545728 kB
Mapped: 9612 kB
Slab: 1349160 kB
PageTables: 19704 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 12446540 kB
Committed_AS: 9478760 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 80412 kB
VmallocChunk: 34359657895 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
cat /proc/meminfo <after the WRF run>
MemTotal: 16442916 kB
MemFree: 14180204 kB
Buffers: 208 kB
Cached: 14928 kB
SwapCached: 1788636 kB
Active: 122928 kB
Inactive: 1682064 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 16442916 kB
LowFree: 14180204 kB
SwapTotal: 4225084 kB
SwapFree: 2213112 kB
Dirty: 68 kB
Writeback: 0 kB
AnonPages: 26628 kB
Mapped: 2668 kB
Slab: 86572 kB
PageTables: 672 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 12446540 kB
Committed_AS: 279580 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 80412 kB
VmallocChunk: 34359657895 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
Hugepagesize: 2048 kB
Yes, we had tested installing 2.1.3 on a couple of our client systems to
see if that would fix the problem, but we were still seeing the issue on
those nodes with the Lustre 2.1.3 client installed on them. Thanks for
clarifying that, and it doesn't appear that this code would
be performing a great deal of readdirs, probably not the same memory leak.
Correct, dropping cache does not free the 1 GB of memory. Our epilogue
script attempts to drop cache twice, and after the second time it compares
the amount of free memory before determining if it can return the compute
node to service.
We are going to run the WRF job with Lustre at a higher logging level and
using the leak_finder.pl script provided by WhamCloud. We will send
whatever we find along to you.
You indicated that you had installed 2.1.3, which contains the fix for LU-1576, this was our main indication. The LU-1576 fix mostly deals with readdir pages, so unless your workload includes a lot of readdirs your have likely a different problem.
Are you saying that dropping cache does not free the 1GB of memory?
In regards to the question of waiting for a few minutes, the answer is no.
Even if we wait for hours, the inactive memory is never given back to the
system, we are forced to reboot these nodes to return them with their full
memory again. However, as you can see from the output in my last message,
we start off with > 6 GB of inactive memory at the beginning of the
epilogue and ~ 1 GB of inactive memory after the epilogue has waited
approximately 30 seconds. Although, no matter how long we wait, that 1 GB
of memory is never returned to the system
We had planned to set up a run of WRF to test the memory usage on our test
cluster, but this has gotten delayed as all of us were busy during the
week. We will have to wait until next week to get you some data on memory
usage.
Having talked with someone much more familiar with WRF and its dependencies
than myself, it sounds like running the WRF software the way that is being
done here, it may be a fairly big hassle. In other words, getting it
running for you locally may be fairly difficult. We will have to see if
going down that road is necessary when we give you some more data.
In the meantime, I'm curious as to how WhamCloud has determined that our
problem does not match up with http://jira.whamcloud.com/browse/LU-1576.
The symptoms are identical, and it was suggested in the HPDD discussion
list that this was an occurrence in Lustre 2.1.2 for some irregular I/O
patterns. What do they see as different between our problem and the one
described by LLNL on the list? For my future reference, I would be
interested to know how they determined that so I could use their methods
for better diagnosing Lustre problems in the future.
I'll have more to share with you next week.
Thanks
Closing this old ticket.
Just because memory is not "Free" doesn't mean that it is "leaked". The kernel will cache pages even if they are unused, until all of the free memory is consumed, and then old data will be freed.
The main concern would be if the node actually runs out of memory and applications start failing (OOM killer, or -ENOMEM=-12 memory allocation errors).