Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.1
-
None
-
3
-
4566
Description
We periodically run lpurge on lustre clients to keep filesystem capacity usage under control. lpurge recurses through the filesystem generating a list of files that have not been accessed within some time threshold and optionally removes them.
https://github.com/chaos/lustre-tools-llnl/blob/master/src/lpurge.c
We have found the nodes running lpurge on a large number of files eventually become unusably slow. In some cases the node is evicted and lpurge terminates, but the slowness persists. There is noticable keyboard lag and delays starting and running processes.
Here are some memory statistic on a slow node. In this example we see about 10G in lustre_inode_cache slab and 30G in Inactive(file). Dropping caches clears out the slabs and the node becomes responsive again. However, Inactive(file) remains unchanged.
The backtraces below show processes stuck in the kernel shrinker, but the lustre-related slabs don't shrink unless we drop caches manually.
# free total used free shared buffers cached Mem: 49416632 46140416 3276216 0 143212 749056 -/+ buffers/cache: 45248148 4168484 Swap: 4000232 0 4000232
# slabtop -o -s c | head Active / Total Objects (% used) : 21568317 / 21691269 (99.4%) Active / Total Slabs (% used) : 1878088 / 1878091 (100.0%) Active / Total Caches (% used) : 134 / 231 (58.0%) Active / Total Size (% used) : 11945321.57K / 11964171.77K (99.8%) Minimum / Average / Maximum Object : 0.02K / 0.55K / 4096.00K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 9035425 9035342 99% 1.06K 1290775 7 10326200K lustre_inode_cache 5804960 5804471 99% 0.19K 290248 20 1160992K dentry 5667330 5659606 99% 0.12K 188911 30 755644K size-128 33005 33005 100% 8.00K 33005 1 264040K size-8192 141100 140027 99% 0.78K 28220 5 112880K ext3_inode_cache 406687 400332 98% 0.06K 6893 59 27572K size-64 232619 156570 67% 0.10K 6287 37 25148K buffer_head 22296 21202 95% 1.00K 5574 4 22296K size-1024 9336 9301 99% 2.00K 4668 2 18672K size-2048 28217 21235 75% 0.55K 4031 7 16124K radix_tree_node 74500 74356 99% 0.19K 3725 20 14900K size-192 230 230 100% 32.12K 230 1 14720K kmem_cache 20128 19625 97% 0.50K 2516 8 10064K size-512 1161 1161 100% 6.65K 1161 1 9288K ll_obd_dev_cache
/proc/meminfo before and after 'echo 3 > /proc/sys/vm/drop_caches'
Before drop_caches After drop_caches MemTotal: 49416632 kB MemTotal: 49416632 kB MemFree: 3195016 kB MemFree: 16576276 kB Buffers: 143724 kB Buffers: 416 kB Cached: 836660 kB Cached: 12572 kB SwapCached: 0 kB SwapCached: 0 kB Active: 473304 kB Active: 30836 kB Inactive: 31535004 kB Inactive: 31010004 kB Active(anon): 22280 kB Active(anon): 22356 kB Inactive(anon): 1304 kB Inactive(anon): 1304 kB Active(file): 451024 kB Active(file): 8480 kB Inactive(file): 31533700 kB Inactive(file): 31008700 kB Unevictable: 0 kB Unevictable: 0 kB Mlocked: 0 kB Mlocked: 0 kB SwapTotal: 4000232 kB SwapTotal: 4000232 kB SwapFree: 4000232 kB SwapFree: 4000232 kB Dirty: 4 kB Dirty: 0 kB Writeback: 0 kB Writeback: 0 kB AnonPages: 23468 kB AnonPages: 23472 kB Mapped: 11988 kB Mapped: 11992 kB Shmem: 192 kB Shmem: 192 kB Slab: 12823932 kB Slab: 409052 kB SReclaimable: 1327712 kB SReclaimable: 12436 kB SUnreclaim: 11496220 kB SUnreclaim: 396616 kB KernelStack: 2768 kB KernelStack: 2768 kB PageTables: 3256 kB PageTables: 3256 kB NFS_Unstable: 0 kB NFS_Unstable: 0 kB Bounce: 0 kB Bounce: 0 kB WritebackTmp: 0 kB WritebackTmp: 0 kB CommitLimit: 28708548 kB CommitLimit: 28708548 kB Committed_AS: 135712 kB Committed_AS: 135708 kB VmallocTotal: 34359738367 kBVmallocTotal: 34359738367 kB VmallocUsed: 1180768 kB VmallocUsed: 1180768 kB VmallocChunk: 34332553664 kBVmallocChunk: 34332553664 kB HardwareCorrupted: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 0 kB AnonHugePages: 0 kB HugePages_Total: 0 HugePages_Total: 0 HugePages_Free: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugepagesize: 2048 kB DirectMap4k: 5312 kB DirectMap4k: 5312 kB DirectMap2M: 2082816 kB DirectMap2M: 2082816 kB DirectMap1G: 48234496 kB DirectMap1G: 48234496 kB
Finally, sysrq-l backtraces from example slow processes show them in shrink_inactive_list:
Process in.mrlogind isolate_pages_global shrink_inactive_list shrink_zone zone_reclaim get_page_from_freelist __alloc_pages_nodemask kmem_getpages cache_grow cache_alloc_refill kmem_cache_alloc __alloc_skb sk_stream_alloc_skb tcp_sendmsg sock_aio_write do_sync_write vfs_write sys_write system_call_fastpath
Process opcontrol __isolate_lru_page isolate_pages_global shrink_inactive_list shrink_zone zone_reclaim isolate_pages_global get_page_from_freelist __alloc_pages_nodemask alloc_pages_current __pte_alloc copy_pte_range kmem_getpages cache_grow cache_alloc_refill kmem_cache_alloc dup_mm copy_process do_fork alloc_fd fd_install sys_clone stub_clone system_call_fastpath
LLNL-bugzilla-ID: 1661
Attachments
Issue Links
- Trackbacks
-
Changelog 2.1 Changes from version 2.1.2 to version 2.1.3 Server support for kernels: 2.6.18308.13.1.el5 (RHEL5) 2.6.32279.2.1.el6 (RHEL6) Client support for unpatched kernels: 2.6.18308.13.1.el5 (RHEL5) 2.6.32279.2.1....