Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.6.0
-
server: Lustre 2.4.2
client: 2.5.59
-
3
-
14377
Description
While doing large sequential IOs with file per process access mode, I observe a significant performance difference between the two following
Lustre client tuning:
1) max_cached_mb is 50% of client memory. this is default setting
My 16 tasks IOR benchmark gives 3300 MiB/s write and 3000 MiB/s read
2) max_cached_mb is 98% of client memory
16 tasks IOR benchmark gives 4300 MiB/s write and 4700 MiB/s read
Client node is a 2 sockets / 16 cores Intel Sandybridge E5-2650 with 64GB memory and 1 Infiniband FDR adapter.
In case 1) the client page cache is regulated by Lustre osc lru mecanism, which calls ll_invalidatepage() through ptlrpcd threads.
In case 2) the client page cache is regulated by the system, because memory is full, which calls ll_releasepage().
Here is the profiling report during write and read phases of IOR in both cases
case 1) write ========== 3.88% IOR [kernel.kallsyms] [k] copy_user_generic_string 1.96% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock 1.93% ptlrpcd_5 [kernel.kallsyms] [k] _spin_lock 1.91% ptlrpcd_0 [kernel.kallsyms] [k] _spin_lock 1.89% ptlrpcd_6 [kernel.kallsyms] [k] _spin_lock 1.88% ptlrpcd_14 [kernel.kallsyms] [k] _spin_lock 1.86% ptlrpcd_13 [kernel.kallsyms] [k] _spin_lock 1.86% ptlrpcd_8 [kernel.kallsyms] [k] _spin_lock 1.84% ptlrpcd_2 [kernel.kallsyms] [k] _spin_lock 1.84% ptlrpcd_7 [kernel.kallsyms] [k] _spin_lock 1.84% ptlrpcd_10 [kernel.kallsyms] [k] _spin_lock 1.83% ptlrpcd_15 [kernel.kallsyms] [k] _spin_lock 1.83% ptlrpcd_4 [kernel.kallsyms] [k] _spin_lock 1.82% ptlrpcd_3 [kernel.kallsyms] [k] _spin_lock 1.79% ptlrpcd_11 [kernel.kallsyms] [k] _spin_lock 1.78% ptlrpcd_12 [kernel.kallsyms] [k] _spin_lock 1.74% ptlrpcd_9 [kernel.kallsyms] [k] _spin_lock 1.55% IOR mca_ghc_sm.so [.] mca_ghc_sm_barrier_D 1.51% IOR [kernel.kallsyms] [k] _spin_lock 1.00% IOR mca_ghc_sm.so [.] mca_ghc_sm_barrier_U case 1) read ========== 3.26% init [kernel.kallsyms] [k] poll_idle 3.07% ptlrpcd_13 [kernel.kallsyms] [k] _spin_lock 3.05% ptlrpcd_0 [kernel.kallsyms] [k] _spin_lock 3.00% ptlrpcd_9 [kernel.kallsyms] [k] _spin_lock 3.00% ptlrpcd_12 [kernel.kallsyms] [k] _spin_lock 2.97% ptlrpcd_2 [kernel.kallsyms] [k] _spin_lock 2.96% ptlrpcd_14 [kernel.kallsyms] [k] _spin_lock 2.94% ptlrpcd_5 [kernel.kallsyms] [k] _spin_lock 2.93% ptlrpcd_8 [kernel.kallsyms] [k] _spin_lock 2.93% ptlrpcd_10 [kernel.kallsyms] [k] _spin_lock 2.92% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock 2.91% ptlrpcd_15 [kernel.kallsyms] [k] _spin_lock 2.91% ptlrpcd_6 [kernel.kallsyms] [k] _spin_lock 2.91% ptlrpcd_3 [kernel.kallsyms] [k] _spin_lock 2.89% ptlrpcd_4 [kernel.kallsyms] [k] _spin_lock 2.87% ptlrpcd_11 [kernel.kallsyms] [k] _spin_lock 2.75% ptlrpcd_7 [kernel.kallsyms] [k] _spin_lock 2.57% IOR [kernel.kallsyms] [k] copy_user_generic_string 1.24% IOR [kernel.kallsyms] [k] _spin_lock 1.07% IOR [osc] [k] osc_page_init case 2) write ========== 5.68% IOR [kernel.kallsyms] [k] copy_user_generic_string 5.51% IOR [kernel.kallsyms] [k] _spin_lock 3.22% IOR [kernel.kallsyms] [k] _spin_lock_irqsave 1.56% IOR [kernel.kallsyms] [k] memset 1.33% IOR [kernel.kallsyms] [k] _spin_lock_irq 1.32% IOR [obdclass] [k] cl_page_alloc 1.16% IOR [obdclass] [k] cl_object_top 1.15% IOR [kernel.kallsyms] [k] _spin_trylock 1.09% IOR [osc] [k] osc_queue_async_io 1.08% IOR [obdclass] [k] cl_page_delete0 1.06% IOR [kernel.kallsyms] [k] mark_page_accessed 0.99% IOR [kernel.kallsyms] [k] radix_tree_delete 0.98% IOR [libcfs] [k] cfs_hash_rw_unlock 0.90% IOR [kernel.kallsyms] [k] __list_add 0.88% IOR [kernel.kallsyms] [k] radix_tree_insert 0.85% IOR [kernel.kallsyms] [k] list_del 0.84% IOR [kernel.kallsyms] [k] get_page_from_freelist 0.82% IOR [kernel.kallsyms] [k] __mem_cgroup_commit_charge 0.80% IOR [osc] [k] __osc_lru_del 0.76% IOR [osc] [k] osc_enter_cache_try.clone.0 case 2) read ========== 5.86% IOR [kernel.kallsyms] [k] _spin_lock 5.37% IOR [kernel.kallsyms] [k] copy_user_generic_string 3.50% IOR [kernel.kallsyms] [k] _spin_lock_irqsave 3.14% IOR mca_ghc_sm.so [.] mca_ghc_sm_reduce_U 3.12% IOR mca_ghc_sm.so [.] mca_ghc_sm_bcast_D 2.45% IOR [osc] [k] osc_page_init 1.95% IOR [kernel.kallsyms] [k] memset 1.37% IOR [obdclass] [k] cl_page_get_trust 1.30% IOR [lustre] [k] vvp_io_read_page 1.27% IOR [obdclass] [k] lprocfs_counter_add 1.16% IOR [obdclass] [k] cl_page_delete0 1.16% IOR [obdclass] [k] cl_page_invoid 1.15% IOR [kernel.kallsyms] [k] radix_tree_delete 1.10% IOR [kernel.kallsyms] [k] __list_add 1.06% IOR [lustre] [k] ll_readahead 1.05% IOR [kernel.kallsyms] [k] put_page 1.00% IOR [kernel.kallsyms] [k] _spin_lock_irq 0.96% IOR [kernel.kallsyms] [k] list_del 0.96% IOR [kernel.kallsyms] [k] get_page_from_freelist 0.94% IOR [obdclass] [k] lu_context_key_get
The call stack of ptlrpcd threads in case 1) looks like this.
1.96% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock | --- _spin_lock | |--45.80%-- cl_env_put | ll_invalidatepage | vvp_page_discard | cl_page_invoid | cl_page_discard | discard_pagevec | osc_lru_shrink | lru_queue_work | work_interpreter | ptlrpc_check_set | ptlrpcd_check | ptlrpcd | kthread | child_rip | |--42.29%-- cl_env_get | ll_invalidatepage | vvp_page_discard | cl_page_invoid | cl_page_discard | discard_pagevec | osc_lru_shrink | lru_queue_work | work_interpreter | ptlrpc_check_set | ptlrpcd_check | ptlrpcd | kthread | child_rip |