Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.6.0
-
server: Lustre 2.4.2
client: 2.5.59
-
3
-
14377
Description
While doing large sequential IOs with file per process access mode, I observe a significant performance difference between the two following
Lustre client tuning:
1) max_cached_mb is 50% of client memory. this is default setting
My 16 tasks IOR benchmark gives 3300 MiB/s write and 3000 MiB/s read
2) max_cached_mb is 98% of client memory
16 tasks IOR benchmark gives 4300 MiB/s write and 4700 MiB/s read
Client node is a 2 sockets / 16 cores Intel Sandybridge E5-2650 with 64GB memory and 1 Infiniband FDR adapter.
In case 1) the client page cache is regulated by Lustre osc lru mecanism, which calls ll_invalidatepage() through ptlrpcd threads.
In case 2) the client page cache is regulated by the system, because memory is full, which calls ll_releasepage().
Here is the profiling report during write and read phases of IOR in both cases
case 1) write
==========
3.88% IOR [kernel.kallsyms] [k] copy_user_generic_string
1.96% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock
1.93% ptlrpcd_5 [kernel.kallsyms] [k] _spin_lock
1.91% ptlrpcd_0 [kernel.kallsyms] [k] _spin_lock
1.89% ptlrpcd_6 [kernel.kallsyms] [k] _spin_lock
1.88% ptlrpcd_14 [kernel.kallsyms] [k] _spin_lock
1.86% ptlrpcd_13 [kernel.kallsyms] [k] _spin_lock
1.86% ptlrpcd_8 [kernel.kallsyms] [k] _spin_lock
1.84% ptlrpcd_2 [kernel.kallsyms] [k] _spin_lock
1.84% ptlrpcd_7 [kernel.kallsyms] [k] _spin_lock
1.84% ptlrpcd_10 [kernel.kallsyms] [k] _spin_lock
1.83% ptlrpcd_15 [kernel.kallsyms] [k] _spin_lock
1.83% ptlrpcd_4 [kernel.kallsyms] [k] _spin_lock
1.82% ptlrpcd_3 [kernel.kallsyms] [k] _spin_lock
1.79% ptlrpcd_11 [kernel.kallsyms] [k] _spin_lock
1.78% ptlrpcd_12 [kernel.kallsyms] [k] _spin_lock
1.74% ptlrpcd_9 [kernel.kallsyms] [k] _spin_lock
1.55% IOR mca_ghc_sm.so [.] mca_ghc_sm_barrier_D
1.51% IOR [kernel.kallsyms] [k] _spin_lock
1.00% IOR mca_ghc_sm.so [.] mca_ghc_sm_barrier_U
case 1) read
==========
3.26% init [kernel.kallsyms] [k] poll_idle
3.07% ptlrpcd_13 [kernel.kallsyms] [k] _spin_lock
3.05% ptlrpcd_0 [kernel.kallsyms] [k] _spin_lock
3.00% ptlrpcd_9 [kernel.kallsyms] [k] _spin_lock
3.00% ptlrpcd_12 [kernel.kallsyms] [k] _spin_lock
2.97% ptlrpcd_2 [kernel.kallsyms] [k] _spin_lock
2.96% ptlrpcd_14 [kernel.kallsyms] [k] _spin_lock
2.94% ptlrpcd_5 [kernel.kallsyms] [k] _spin_lock
2.93% ptlrpcd_8 [kernel.kallsyms] [k] _spin_lock
2.93% ptlrpcd_10 [kernel.kallsyms] [k] _spin_lock
2.92% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock
2.91% ptlrpcd_15 [kernel.kallsyms] [k] _spin_lock
2.91% ptlrpcd_6 [kernel.kallsyms] [k] _spin_lock
2.91% ptlrpcd_3 [kernel.kallsyms] [k] _spin_lock
2.89% ptlrpcd_4 [kernel.kallsyms] [k] _spin_lock
2.87% ptlrpcd_11 [kernel.kallsyms] [k] _spin_lock
2.75% ptlrpcd_7 [kernel.kallsyms] [k] _spin_lock
2.57% IOR [kernel.kallsyms] [k] copy_user_generic_string
1.24% IOR [kernel.kallsyms] [k] _spin_lock
1.07% IOR [osc] [k] osc_page_init
case 2) write
==========
5.68% IOR [kernel.kallsyms] [k] copy_user_generic_string
5.51% IOR [kernel.kallsyms] [k] _spin_lock
3.22% IOR [kernel.kallsyms] [k] _spin_lock_irqsave
1.56% IOR [kernel.kallsyms] [k] memset
1.33% IOR [kernel.kallsyms] [k] _spin_lock_irq
1.32% IOR [obdclass] [k] cl_page_alloc
1.16% IOR [obdclass] [k] cl_object_top
1.15% IOR [kernel.kallsyms] [k] _spin_trylock
1.09% IOR [osc] [k] osc_queue_async_io
1.08% IOR [obdclass] [k] cl_page_delete0
1.06% IOR [kernel.kallsyms] [k] mark_page_accessed
0.99% IOR [kernel.kallsyms] [k] radix_tree_delete
0.98% IOR [libcfs] [k] cfs_hash_rw_unlock
0.90% IOR [kernel.kallsyms] [k] __list_add
0.88% IOR [kernel.kallsyms] [k] radix_tree_insert
0.85% IOR [kernel.kallsyms] [k] list_del
0.84% IOR [kernel.kallsyms] [k] get_page_from_freelist
0.82% IOR [kernel.kallsyms] [k] __mem_cgroup_commit_charge
0.80% IOR [osc] [k] __osc_lru_del
0.76% IOR [osc] [k] osc_enter_cache_try.clone.0
case 2) read
==========
5.86% IOR [kernel.kallsyms] [k] _spin_lock
5.37% IOR [kernel.kallsyms] [k] copy_user_generic_string
3.50% IOR [kernel.kallsyms] [k] _spin_lock_irqsave
3.14% IOR mca_ghc_sm.so [.] mca_ghc_sm_reduce_U
3.12% IOR mca_ghc_sm.so [.] mca_ghc_sm_bcast_D
2.45% IOR [osc] [k] osc_page_init
1.95% IOR [kernel.kallsyms] [k] memset
1.37% IOR [obdclass] [k] cl_page_get_trust
1.30% IOR [lustre] [k] vvp_io_read_page
1.27% IOR [obdclass] [k] lprocfs_counter_add
1.16% IOR [obdclass] [k] cl_page_delete0
1.16% IOR [obdclass] [k] cl_page_invoid
1.15% IOR [kernel.kallsyms] [k] radix_tree_delete
1.10% IOR [kernel.kallsyms] [k] __list_add
1.06% IOR [lustre] [k] ll_readahead
1.05% IOR [kernel.kallsyms] [k] put_page
1.00% IOR [kernel.kallsyms] [k] _spin_lock_irq
0.96% IOR [kernel.kallsyms] [k] list_del
0.96% IOR [kernel.kallsyms] [k] get_page_from_freelist
0.94% IOR [obdclass] [k] lu_context_key_get
The call stack of ptlrpcd threads in case 1) looks like this.
1.96% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock
|
--- _spin_lock
|
|--45.80%-- cl_env_put
| ll_invalidatepage
| vvp_page_discard
| cl_page_invoid
| cl_page_discard
| discard_pagevec
| osc_lru_shrink
| lru_queue_work
| work_interpreter
| ptlrpc_check_set
| ptlrpcd_check
| ptlrpcd
| kthread
| child_rip
|
|--42.29%-- cl_env_get
| ll_invalidatepage
| vvp_page_discard
| cl_page_invoid
| cl_page_discard
| discard_pagevec
| osc_lru_shrink
| lru_queue_work
| work_interpreter
| ptlrpc_check_set
| ptlrpcd_check
| ptlrpcd
| kthread
| child_rip
|