[LU-5182] High contention in ll_invalidatepage() code path Created: 12/Jun/14  Updated: 13/Jun/14  Resolved: 13/Jun/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Gregoire Pichon Assignee: Jinshan Xiong (Inactive)
Resolution: Duplicate Votes: 0
Labels: performance
Environment:

server: Lustre 2.4.2
client: 2.5.59


Severity: 3
Epic: performance
Rank (Obsolete): 14377

 Description   

While doing large sequential IOs with file per process access mode, I observe a significant performance difference between the two following

Lustre client tuning:

1) max_cached_mb is 50% of client memory. this is default setting
My 16 tasks IOR benchmark gives 3300 MiB/s write and 3000 MiB/s read

2) max_cached_mb is 98% of client memory
16 tasks IOR benchmark gives 4300 MiB/s write and 4700 MiB/s read

Client node is a 2 sockets / 16 cores Intel Sandybridge E5-2650 with 64GB memory and 1 Infiniband FDR adapter.

In case 1) the client page cache is regulated by Lustre osc lru mecanism, which calls ll_invalidatepage() through ptlrpcd threads.

In case 2) the client page cache is regulated by the system, because memory is full, which calls ll_releasepage().

Here is the profiling report during write and read phases of IOR in both cases

case 1) write
==========
     3.88%              IOR  [kernel.kallsyms]        [k] copy_user_generic_string             
     1.96%        ptlrpcd_1  [kernel.kallsyms]        [k] _spin_lock                           
     1.93%        ptlrpcd_5  [kernel.kallsyms]        [k] _spin_lock                           
     1.91%        ptlrpcd_0  [kernel.kallsyms]        [k] _spin_lock                           
     1.89%        ptlrpcd_6  [kernel.kallsyms]        [k] _spin_lock                           
     1.88%       ptlrpcd_14  [kernel.kallsyms]        [k] _spin_lock                           
     1.86%       ptlrpcd_13  [kernel.kallsyms]        [k] _spin_lock                           
     1.86%        ptlrpcd_8  [kernel.kallsyms]        [k] _spin_lock                           
     1.84%        ptlrpcd_2  [kernel.kallsyms]        [k] _spin_lock                           
     1.84%        ptlrpcd_7  [kernel.kallsyms]        [k] _spin_lock                           
     1.84%       ptlrpcd_10  [kernel.kallsyms]        [k] _spin_lock                           
     1.83%       ptlrpcd_15  [kernel.kallsyms]        [k] _spin_lock                           
     1.83%        ptlrpcd_4  [kernel.kallsyms]        [k] _spin_lock                           
     1.82%        ptlrpcd_3  [kernel.kallsyms]        [k] _spin_lock                           
     1.79%       ptlrpcd_11  [kernel.kallsyms]        [k] _spin_lock                           
     1.78%       ptlrpcd_12  [kernel.kallsyms]        [k] _spin_lock                           
     1.74%        ptlrpcd_9  [kernel.kallsyms]        [k] _spin_lock                           
     1.55%              IOR  mca_ghc_sm.so            [.] mca_ghc_sm_barrier_D                 
     1.51%              IOR  [kernel.kallsyms]        [k] _spin_lock                           
     1.00%              IOR  mca_ghc_sm.so            [.] mca_ghc_sm_barrier_U              
 

case 1) read
==========
     3.26%             init  [kernel.kallsyms]         [k] poll_idle                            
     3.07%       ptlrpcd_13  [kernel.kallsyms]         [k] _spin_lock                           
     3.05%        ptlrpcd_0  [kernel.kallsyms]         [k] _spin_lock                           
     3.00%        ptlrpcd_9  [kernel.kallsyms]         [k] _spin_lock                           
     3.00%       ptlrpcd_12  [kernel.kallsyms]         [k] _spin_lock                           
     2.97%        ptlrpcd_2  [kernel.kallsyms]         [k] _spin_lock                           
     2.96%       ptlrpcd_14  [kernel.kallsyms]         [k] _spin_lock                           
     2.94%        ptlrpcd_5  [kernel.kallsyms]         [k] _spin_lock                           
     2.93%        ptlrpcd_8  [kernel.kallsyms]         [k] _spin_lock                           
     2.93%       ptlrpcd_10  [kernel.kallsyms]         [k] _spin_lock                           
     2.92%        ptlrpcd_1  [kernel.kallsyms]         [k] _spin_lock                           
     2.91%       ptlrpcd_15  [kernel.kallsyms]         [k] _spin_lock                           
     2.91%        ptlrpcd_6  [kernel.kallsyms]         [k] _spin_lock                           
     2.91%        ptlrpcd_3  [kernel.kallsyms]         [k] _spin_lock                           
     2.89%        ptlrpcd_4  [kernel.kallsyms]         [k] _spin_lock                           
     2.87%       ptlrpcd_11  [kernel.kallsyms]         [k] _spin_lock                           
     2.75%        ptlrpcd_7  [kernel.kallsyms]         [k] _spin_lock                           
     2.57%              IOR  [kernel.kallsyms]         [k] copy_user_generic_string             
     1.24%              IOR  [kernel.kallsyms]         [k] _spin_lock                           
     1.07%              IOR  [osc]                     [k] osc_page_init                        


case 2) write
==========
     5.68%              IOR  [kernel.kallsyms]         [k] copy_user_generic_string             
     5.51%              IOR  [kernel.kallsyms]         [k] _spin_lock                           
     3.22%              IOR  [kernel.kallsyms]         [k] _spin_lock_irqsave                   
     1.56%              IOR  [kernel.kallsyms]         [k] memset                               
     1.33%              IOR  [kernel.kallsyms]         [k] _spin_lock_irq                       
     1.32%              IOR  [obdclass]                [k] cl_page_alloc                        
     1.16%              IOR  [obdclass]                [k] cl_object_top                        
     1.15%              IOR  [kernel.kallsyms]         [k] _spin_trylock                        
     1.09%              IOR  [osc]                     [k] osc_queue_async_io                   
     1.08%              IOR  [obdclass]                [k] cl_page_delete0                      
     1.06%              IOR  [kernel.kallsyms]         [k] mark_page_accessed                   
     0.99%              IOR  [kernel.kallsyms]         [k] radix_tree_delete                    
     0.98%              IOR  [libcfs]                  [k] cfs_hash_rw_unlock                   
     0.90%              IOR  [kernel.kallsyms]         [k] __list_add                           
     0.88%              IOR  [kernel.kallsyms]         [k] radix_tree_insert                    
     0.85%              IOR  [kernel.kallsyms]         [k] list_del                             
     0.84%              IOR  [kernel.kallsyms]         [k] get_page_from_freelist               
     0.82%              IOR  [kernel.kallsyms]         [k] __mem_cgroup_commit_charge           
     0.80%              IOR  [osc]                     [k] __osc_lru_del                        
     0.76%              IOR  [osc]                     [k] osc_enter_cache_try.clone.0          


case 2) read
==========
     5.86%              IOR  [kernel.kallsyms]           [k] _spin_lock                           
     5.37%              IOR  [kernel.kallsyms]           [k] copy_user_generic_string             
     3.50%              IOR  [kernel.kallsyms]           [k] _spin_lock_irqsave                   
     3.14%              IOR  mca_ghc_sm.so               [.] mca_ghc_sm_reduce_U                  
     3.12%              IOR  mca_ghc_sm.so               [.] mca_ghc_sm_bcast_D                   
     2.45%              IOR  [osc]                       [k] osc_page_init                        
     1.95%              IOR  [kernel.kallsyms]           [k] memset                               
     1.37%              IOR  [obdclass]                  [k] cl_page_get_trust                    
     1.30%              IOR  [lustre]                    [k] vvp_io_read_page                     
     1.27%              IOR  [obdclass]                  [k] lprocfs_counter_add                  
     1.16%              IOR  [obdclass]                  [k] cl_page_delete0                      
     1.16%              IOR  [obdclass]                  [k] cl_page_invoid                       
     1.15%              IOR  [kernel.kallsyms]           [k] radix_tree_delete                    
     1.10%              IOR  [kernel.kallsyms]           [k] __list_add                           
     1.06%              IOR  [lustre]                    [k] ll_readahead                         
     1.05%              IOR  [kernel.kallsyms]           [k] put_page                             
     1.00%              IOR  [kernel.kallsyms]           [k] _spin_lock_irq                       
     0.96%              IOR  [kernel.kallsyms]           [k] list_del                             
     0.96%              IOR  [kernel.kallsyms]           [k] get_page_from_freelist               
     0.94%              IOR  [obdclass]                  [k] lu_context_key_get                   

The call stack of ptlrpcd threads in case 1) looks like this.

     1.96%        ptlrpcd_1  [kernel.kallsyms]        [k] _spin_lock
                  |
                  --- _spin_lock
                     |
                     |--45.80%-- cl_env_put
                     |          ll_invalidatepage
                     |          vvp_page_discard
                     |          cl_page_invoid
                     |          cl_page_discard
                     |          discard_pagevec
                     |          osc_lru_shrink
                     |          lru_queue_work
                     |          work_interpreter
                     |          ptlrpc_check_set
                     |          ptlrpcd_check
                     |          ptlrpcd
                     |          kthread
                     |          child_rip
                     |
                     |--42.29%-- cl_env_get
                     |          ll_invalidatepage
                     |          vvp_page_discard
                     |          cl_page_invoid
                     |          cl_page_discard
                     |          discard_pagevec
                     |          osc_lru_shrink
                     |          lru_queue_work
                     |          work_interpreter
                     |          ptlrpc_check_set
                     |          ptlrpcd_check
                     |          ptlrpcd
                     |          kthread
                     |          child_rip
                     |


 Comments   
Comment by Jinshan Xiong (Inactive) [ 12/Jun/14 ]

Can you apply patch http://review.whamcloud.com/10458 and see if it can help?

Comment by Gregoire Pichon [ 13/Jun/14 ]

Yes, it helps !

I have added the two following patches to version 2.5.59 (the second patch requires the first one)

With max_cached_mb default value (50% of client memory size) the 16 tasks IOR benchmark gives 4700 MiB/s write and 5300 MiB/s read.

The profiling report does not show contention in ll_invalidatepage() any more.

write
=====
     7.42%              IOR  [kernel.kallsyms]         [k] copy_user_generic_string             
     2.11%              IOR  [kernel.kallsyms]         [k] _spin_lock                           
     1.59%              IOR  [osc]                     [k] osc_enter_cache_try.clone.0          
     1.46%              IOR  [kernel.kallsyms]         [k] memset                               
     1.44%              IOR  [obdclass]                [k] cl_object_top                        
     1.42%              IOR  [osc]                     [k] osc_queue_async_io                   
     1.29%              IOR  [kernel.kallsyms]         [k] mark_page_accessed                   
     1.28%              IOR  [obdclass]                [k] cl_page_alloc                        
     1.21%              IOR  [kernel.kallsyms]         [k] radix_tree_insert                    
     1.17%              IOR  [kernel.kallsyms]         [k] _spin_lock_irqsave                   
     1.15%              IOR  [kernel.kallsyms]         [k] _spin_lock_irq                       
     0.96%              IOR  [obdclass]                [k] lu_context_key_get                   
     0.88%              IOR  [osc]                     [k] osc_page_init                        
     0.86%              IOR  [kernel.kallsyms]         [k] iov_iter_fault_in_readable           
     0.84%              IOR  [kernel.kallsyms]         [k] __mem_cgroup_commit_charge           
     0.82%              IOR  [kernel.kallsyms]         [k] radix_tree_delete                    
     0.80%              IOR  [kernel.kallsyms]         [k] __list_add                           
     0.76%              IOR  [obdclass]                [k] cl_page_own0                         
     0.75%              IOR  [lov]                     [k] lov_page_init_raid0                  
     0.71%              IOR  [lustre]                  [k] ll_write_end                         

read
====
     8.04%              IOR  [kernel.kallsyms]        [k] copy_user_generic_string             
     3.59%              IOR  [kernel.kallsyms]        [k] _spin_lock_irqsave                   
     3.22%              IOR  [osc]                    [k] osc_page_init                        
     2.42%              IOR  [kernel.kallsyms]        [k] _spin_lock                           
     1.75%              IOR  [kernel.kallsyms]        [k] put_page                             
     1.74%              IOR  [kernel.kallsyms]        [k] memset                               
     1.63%              IOR  [obdclass]               [k] cl_page_invoid                       
     1.55%              IOR  [obdclass]               [k] cl_page_get_trust                    
     1.38%              IOR  [lustre]                 [k] vvp_io_read_page                     
     1.35%              IOR  [lustre]                 [k] ll_readahead                         
     1.30%              IOR  [obdclass]               [k] lu_context_key_get                   
     1.20%              IOR  [kernel.kallsyms]        [k] __list_add                           
     1.19%              IOR  [lustre]                 [k] ll_ra_stats_inc_sbi                  
     1.17%              IOR  [obdclass]               [k] lprocfs_counter_add                  
     1.04%              IOR  [lov]                    [k] lov_page_init_raid0                  
     0.99%              IOR  [osc]                    [k] osc_io_submit                        
     0.98%              IOR  [osc]                    [k] osc_lru_shrink                       
     0.94%              IOR  [kernel.kallsyms]        [k] radix_tree_delete                    
     0.89%              IOR  [kernel.kallsyms]        [k] radix_tree_lookup_slot               
     0.84%              IOR  [kernel.kallsyms]        [k] __mem_cgroup_commit_charge           

This ticket can be closed when patch #10458 is landed into master.

Comment by Jinshan Xiong (Inactive) [ 13/Jun/14 ]

The patch is already in LU-5108

Generated at Sat Feb 10 01:49:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.