[LU-5182] High contention in ll_invalidatepage() code path Created: 12/Jun/14 Updated: 13/Jun/14 Resolved: 13/Jun/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Gregoire Pichon | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | performance | ||
| Environment: |
server: Lustre 2.4.2 |
||
| Severity: | 3 |
| Epic: | performance |
| Rank (Obsolete): | 14377 |
| Description |
|
While doing large sequential IOs with file per process access mode, I observe a significant performance difference between the two following Lustre client tuning: 1) max_cached_mb is 50% of client memory. this is default setting 2) max_cached_mb is 98% of client memory Client node is a 2 sockets / 16 cores Intel Sandybridge E5-2650 with 64GB memory and 1 Infiniband FDR adapter. In case 1) the client page cache is regulated by Lustre osc lru mecanism, which calls ll_invalidatepage() through ptlrpcd threads. In case 2) the client page cache is regulated by the system, because memory is full, which calls ll_releasepage(). Here is the profiling report during write and read phases of IOR in both cases case 1) write
==========
3.88% IOR [kernel.kallsyms] [k] copy_user_generic_string
1.96% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock
1.93% ptlrpcd_5 [kernel.kallsyms] [k] _spin_lock
1.91% ptlrpcd_0 [kernel.kallsyms] [k] _spin_lock
1.89% ptlrpcd_6 [kernel.kallsyms] [k] _spin_lock
1.88% ptlrpcd_14 [kernel.kallsyms] [k] _spin_lock
1.86% ptlrpcd_13 [kernel.kallsyms] [k] _spin_lock
1.86% ptlrpcd_8 [kernel.kallsyms] [k] _spin_lock
1.84% ptlrpcd_2 [kernel.kallsyms] [k] _spin_lock
1.84% ptlrpcd_7 [kernel.kallsyms] [k] _spin_lock
1.84% ptlrpcd_10 [kernel.kallsyms] [k] _spin_lock
1.83% ptlrpcd_15 [kernel.kallsyms] [k] _spin_lock
1.83% ptlrpcd_4 [kernel.kallsyms] [k] _spin_lock
1.82% ptlrpcd_3 [kernel.kallsyms] [k] _spin_lock
1.79% ptlrpcd_11 [kernel.kallsyms] [k] _spin_lock
1.78% ptlrpcd_12 [kernel.kallsyms] [k] _spin_lock
1.74% ptlrpcd_9 [kernel.kallsyms] [k] _spin_lock
1.55% IOR mca_ghc_sm.so [.] mca_ghc_sm_barrier_D
1.51% IOR [kernel.kallsyms] [k] _spin_lock
1.00% IOR mca_ghc_sm.so [.] mca_ghc_sm_barrier_U
case 1) read
==========
3.26% init [kernel.kallsyms] [k] poll_idle
3.07% ptlrpcd_13 [kernel.kallsyms] [k] _spin_lock
3.05% ptlrpcd_0 [kernel.kallsyms] [k] _spin_lock
3.00% ptlrpcd_9 [kernel.kallsyms] [k] _spin_lock
3.00% ptlrpcd_12 [kernel.kallsyms] [k] _spin_lock
2.97% ptlrpcd_2 [kernel.kallsyms] [k] _spin_lock
2.96% ptlrpcd_14 [kernel.kallsyms] [k] _spin_lock
2.94% ptlrpcd_5 [kernel.kallsyms] [k] _spin_lock
2.93% ptlrpcd_8 [kernel.kallsyms] [k] _spin_lock
2.93% ptlrpcd_10 [kernel.kallsyms] [k] _spin_lock
2.92% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock
2.91% ptlrpcd_15 [kernel.kallsyms] [k] _spin_lock
2.91% ptlrpcd_6 [kernel.kallsyms] [k] _spin_lock
2.91% ptlrpcd_3 [kernel.kallsyms] [k] _spin_lock
2.89% ptlrpcd_4 [kernel.kallsyms] [k] _spin_lock
2.87% ptlrpcd_11 [kernel.kallsyms] [k] _spin_lock
2.75% ptlrpcd_7 [kernel.kallsyms] [k] _spin_lock
2.57% IOR [kernel.kallsyms] [k] copy_user_generic_string
1.24% IOR [kernel.kallsyms] [k] _spin_lock
1.07% IOR [osc] [k] osc_page_init
case 2) write
==========
5.68% IOR [kernel.kallsyms] [k] copy_user_generic_string
5.51% IOR [kernel.kallsyms] [k] _spin_lock
3.22% IOR [kernel.kallsyms] [k] _spin_lock_irqsave
1.56% IOR [kernel.kallsyms] [k] memset
1.33% IOR [kernel.kallsyms] [k] _spin_lock_irq
1.32% IOR [obdclass] [k] cl_page_alloc
1.16% IOR [obdclass] [k] cl_object_top
1.15% IOR [kernel.kallsyms] [k] _spin_trylock
1.09% IOR [osc] [k] osc_queue_async_io
1.08% IOR [obdclass] [k] cl_page_delete0
1.06% IOR [kernel.kallsyms] [k] mark_page_accessed
0.99% IOR [kernel.kallsyms] [k] radix_tree_delete
0.98% IOR [libcfs] [k] cfs_hash_rw_unlock
0.90% IOR [kernel.kallsyms] [k] __list_add
0.88% IOR [kernel.kallsyms] [k] radix_tree_insert
0.85% IOR [kernel.kallsyms] [k] list_del
0.84% IOR [kernel.kallsyms] [k] get_page_from_freelist
0.82% IOR [kernel.kallsyms] [k] __mem_cgroup_commit_charge
0.80% IOR [osc] [k] __osc_lru_del
0.76% IOR [osc] [k] osc_enter_cache_try.clone.0
case 2) read
==========
5.86% IOR [kernel.kallsyms] [k] _spin_lock
5.37% IOR [kernel.kallsyms] [k] copy_user_generic_string
3.50% IOR [kernel.kallsyms] [k] _spin_lock_irqsave
3.14% IOR mca_ghc_sm.so [.] mca_ghc_sm_reduce_U
3.12% IOR mca_ghc_sm.so [.] mca_ghc_sm_bcast_D
2.45% IOR [osc] [k] osc_page_init
1.95% IOR [kernel.kallsyms] [k] memset
1.37% IOR [obdclass] [k] cl_page_get_trust
1.30% IOR [lustre] [k] vvp_io_read_page
1.27% IOR [obdclass] [k] lprocfs_counter_add
1.16% IOR [obdclass] [k] cl_page_delete0
1.16% IOR [obdclass] [k] cl_page_invoid
1.15% IOR [kernel.kallsyms] [k] radix_tree_delete
1.10% IOR [kernel.kallsyms] [k] __list_add
1.06% IOR [lustre] [k] ll_readahead
1.05% IOR [kernel.kallsyms] [k] put_page
1.00% IOR [kernel.kallsyms] [k] _spin_lock_irq
0.96% IOR [kernel.kallsyms] [k] list_del
0.96% IOR [kernel.kallsyms] [k] get_page_from_freelist
0.94% IOR [obdclass] [k] lu_context_key_get
The call stack of ptlrpcd threads in case 1) looks like this. 1.96% ptlrpcd_1 [kernel.kallsyms] [k] _spin_lock
|
--- _spin_lock
|
|--45.80%-- cl_env_put
| ll_invalidatepage
| vvp_page_discard
| cl_page_invoid
| cl_page_discard
| discard_pagevec
| osc_lru_shrink
| lru_queue_work
| work_interpreter
| ptlrpc_check_set
| ptlrpcd_check
| ptlrpcd
| kthread
| child_rip
|
|--42.29%-- cl_env_get
| ll_invalidatepage
| vvp_page_discard
| cl_page_invoid
| cl_page_discard
| discard_pagevec
| osc_lru_shrink
| lru_queue_work
| work_interpreter
| ptlrpc_check_set
| ptlrpcd_check
| ptlrpcd
| kthread
| child_rip
|
|
| Comments |
| Comment by Jinshan Xiong (Inactive) [ 12/Jun/14 ] |
|
Can you apply patch http://review.whamcloud.com/10458 and see if it can help? |
| Comment by Gregoire Pichon [ 13/Jun/14 ] |
|
Yes, it helps ! I have added the two following patches to version 2.5.59 (the second patch requires the first one)
With max_cached_mb default value (50% of client memory size) the 16 tasks IOR benchmark gives 4700 MiB/s write and 5300 MiB/s read. The profiling report does not show contention in ll_invalidatepage() any more. write
=====
7.42% IOR [kernel.kallsyms] [k] copy_user_generic_string
2.11% IOR [kernel.kallsyms] [k] _spin_lock
1.59% IOR [osc] [k] osc_enter_cache_try.clone.0
1.46% IOR [kernel.kallsyms] [k] memset
1.44% IOR [obdclass] [k] cl_object_top
1.42% IOR [osc] [k] osc_queue_async_io
1.29% IOR [kernel.kallsyms] [k] mark_page_accessed
1.28% IOR [obdclass] [k] cl_page_alloc
1.21% IOR [kernel.kallsyms] [k] radix_tree_insert
1.17% IOR [kernel.kallsyms] [k] _spin_lock_irqsave
1.15% IOR [kernel.kallsyms] [k] _spin_lock_irq
0.96% IOR [obdclass] [k] lu_context_key_get
0.88% IOR [osc] [k] osc_page_init
0.86% IOR [kernel.kallsyms] [k] iov_iter_fault_in_readable
0.84% IOR [kernel.kallsyms] [k] __mem_cgroup_commit_charge
0.82% IOR [kernel.kallsyms] [k] radix_tree_delete
0.80% IOR [kernel.kallsyms] [k] __list_add
0.76% IOR [obdclass] [k] cl_page_own0
0.75% IOR [lov] [k] lov_page_init_raid0
0.71% IOR [lustre] [k] ll_write_end
read
====
8.04% IOR [kernel.kallsyms] [k] copy_user_generic_string
3.59% IOR [kernel.kallsyms] [k] _spin_lock_irqsave
3.22% IOR [osc] [k] osc_page_init
2.42% IOR [kernel.kallsyms] [k] _spin_lock
1.75% IOR [kernel.kallsyms] [k] put_page
1.74% IOR [kernel.kallsyms] [k] memset
1.63% IOR [obdclass] [k] cl_page_invoid
1.55% IOR [obdclass] [k] cl_page_get_trust
1.38% IOR [lustre] [k] vvp_io_read_page
1.35% IOR [lustre] [k] ll_readahead
1.30% IOR [obdclass] [k] lu_context_key_get
1.20% IOR [kernel.kallsyms] [k] __list_add
1.19% IOR [lustre] [k] ll_ra_stats_inc_sbi
1.17% IOR [obdclass] [k] lprocfs_counter_add
1.04% IOR [lov] [k] lov_page_init_raid0
0.99% IOR [osc] [k] osc_io_submit
0.98% IOR [osc] [k] osc_lru_shrink
0.94% IOR [kernel.kallsyms] [k] radix_tree_delete
0.89% IOR [kernel.kallsyms] [k] radix_tree_lookup_slot
0.84% IOR [kernel.kallsyms] [k] __mem_cgroup_commit_charge
This ticket can be closed when patch #10458 is landed into master. |
| Comment by Jinshan Xiong (Inactive) [ 13/Jun/14 ] |
|
The patch is already in |