Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.3.0, Lustre 2.4.0
-
3
-
10135
Description
Compared to NFS it seems to take Lustre a long time to free clean pages from the cache. Speeding this up may help client performance on a number of fronts since it's such a common operation. Giving up memory more quickly may also improve client stability under low memory conditions.
For example, if I cat 50 GB of data to /dev/null from NFS, it takes 3.7 seconds to drop caches.
# grove617 /root > dd if=/var/dumps/bigfile of=/dev/null bs=1M > /dev/null 2>&1 # grove617 /root > grep ^Cached /proc/meminfo ; time echo 3 > /proc/sys/vm/drop_caches ; grep ^Cached /proc/meminfo Cached: 49492808 kB real 0m3.707s user 0m0.000s sys 0m3.696s Cached: 59436 kB
A similar test form Lustre takes about 7 times as long.
# grove617 /root > for x in `seq 129 154` ; do dd if=/p/lstest/bass6/fio/read.$x.0 of=/dev/null bs=1M >/dev/null 2>&1 ; done # grove617 /root > grep ^Cached /proc/meminfo ; time echo 3 > /proc/sys/vm/drop_caches ; grep ^Cached /proc/meminfo Cached: 47703020 kB real 0m26.961s user 0m0.000s sys 0m26.870s Cached: 59768 kB
Oprofile data for the Lustre test:
vma samples cum. samples % cum. % linenr info app name symbol name ffffffff8115ff60 60729 60729 10.3025 10.3025 slab.c:3836 vmlinux kmem_cache_free 0000000000001810 37739 98468 6.4023 16.7048 lvfs_lib.c:111 lvfs.ko lprocfs_counter_sub ffffffff81272930 25490 123958 4.3243 21.0290 radix-tree.c:1237 vmlinux radix_tree_delete 0000000000015cc0 21352 145310 3.6223 24.6513 osc_page.c:767 osc.ko osc_lru_del 0000000000000000 21289 166599 3.6116 28.2629 (no location information) lustre /lustre 0000000000051030 18837 185436 3.1956 31.4586 cl_page.c:719 obdclass.ko cl_vmpage_page ffffffff8127d650 18123 203559 3.0745 34.5331 list_debug.c:45 vmlinux list_del ffffffff811602a0 16864 220423 2.8609 37.3940 slab.c:3516 vmlinux free_block 0000000000051d00 16518 236941 2.8022 40.1962 cl_page.c:284 obdclass.ko cl_page_free 0000000000051200 15149 252090 2.5700 42.7662 cl_page.c:1124 obdclass.ko cl_page_delete0 000000000004eb90 14321 266411 2.4295 45.1957 cl_page.c:1236 obdclass.ko cl_page_export 0000000000052150 13936 280347 2.3642 47.5599 cl_page.c:651 obdclass.ko cl_page_put ffffffff8126c5e0 13737 294084 2.3304 49.8903 dec_and_lock.c:21 vmlinux _atomic_dec_and_lock ffffffff8127d6f0 11844 305928 2.0093 51.8996 list_debug.c:22 vmlinux __list_add
and for NFS:
vma samples cum. samples % cum. % linenr info app name symbol name ffffffff8116b0f0 9239 9239 9.6682 9.6682 memcontrol.c:2362 vmlinux __mem_cgroup_uncharge_common ffffffff81129490 7402 16641 7.7458 17.4140 vmscan.c:427 vmlinux __remove_mapping ffffffff811243d0 7250 23891 7.5868 25.0008 page_alloc.c:1152 vmlinux free_hot_cold_page ffffffff811234f0 6747 30638 7.0604 32.0612 page_alloc.c:592 vmlinux free_pcppages_bulk ffffffff81127af0 5401 36039 5.6519 37.7131 swap.c:438 vmlinux release_pages ffffffff81272930 5302 41341 5.5483 43.2614 radix-tree.c:1237 vmlinux radix_tree_delete ffffffff811330d0 4319 45660 4.5196 47.7810 vmstat.c:282 vmlinux __dec_zone_state ffffffff811284b0 4049 49709 4.2371 52.0181 truncate.c:173 vmlinux invalidate_inode_page ffffffff81110a20 3666 53375 3.8363 55.8544 filemap.c:794 vmlinux find_get_pages ffffffff8110fdb0 3601 56976 3.7683 59.6226 filemap.c:527 vmlinux page_waitqueue ffffffff8127d650 3367 60343 3.5234 63.1461 list_debug.c:45 vmlinux list_del ffffffff81110e10 3007 63350 3.1467 66.2927 filemap.c:579 vmlinux unlock_page ffffffff81167cd0 2976 66326 3.1142 69.4070 memcontrol.c:731 vmlinux mem_cgroup_del_lru_list ffffffff81120e00 2883 69209 3.0169 72.4239 page_alloc.c:5350 vmlinux get_pageblock_flags_group ffffffff81128720 2686 71895 2.8108 75.2347 truncate.c:330 vmlinux invalidate_mapping_pages ffffffff81091010 2193 74088 2.2949 77.5295 wait.c:251 vmlinux __wake_up_bit ffffffff81167440 2175 76263 2.2760 79.8056 bit_spinlock.h:11 vmlinux bit_spin_lock ffffffff81271c40 2100 78363 2.1975 82.0031 radix-tree.c:820 vmlinux __lookup ... ffffffff8115ff60 182 89326 0.1905 93.4754 slab.c:3836 vmlinux kmem_cache_free
It's interesting that kmem_cache_free accounts for 10% of CPU cycles in Lustre, but only 0.2% for NFS. It also seems we're spending a lot of time updating lprocfs counters.