Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5182

High contention in ll_invalidatepage() code path

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • Lustre 2.6.0
    • server: Lustre 2.4.2
      client: 2.5.59

    Description

      While doing large sequential IOs with file per process access mode, I observe a significant performance difference between the two following

      Lustre client tuning:

      1) max_cached_mb is 50% of client memory. this is default setting
      My 16 tasks IOR benchmark gives 3300 MiB/s write and 3000 MiB/s read

      2) max_cached_mb is 98% of client memory
      16 tasks IOR benchmark gives 4300 MiB/s write and 4700 MiB/s read

      Client node is a 2 sockets / 16 cores Intel Sandybridge E5-2650 with 64GB memory and 1 Infiniband FDR adapter.

      In case 1) the client page cache is regulated by Lustre osc lru mecanism, which calls ll_invalidatepage() through ptlrpcd threads.

      In case 2) the client page cache is regulated by the system, because memory is full, which calls ll_releasepage().

      Here is the profiling report during write and read phases of IOR in both cases

      case 1) write
      ==========
           3.88%              IOR  [kernel.kallsyms]        [k] copy_user_generic_string             
           1.96%        ptlrpcd_1  [kernel.kallsyms]        [k] _spin_lock                           
           1.93%        ptlrpcd_5  [kernel.kallsyms]        [k] _spin_lock                           
           1.91%        ptlrpcd_0  [kernel.kallsyms]        [k] _spin_lock                           
           1.89%        ptlrpcd_6  [kernel.kallsyms]        [k] _spin_lock                           
           1.88%       ptlrpcd_14  [kernel.kallsyms]        [k] _spin_lock                           
           1.86%       ptlrpcd_13  [kernel.kallsyms]        [k] _spin_lock                           
           1.86%        ptlrpcd_8  [kernel.kallsyms]        [k] _spin_lock                           
           1.84%        ptlrpcd_2  [kernel.kallsyms]        [k] _spin_lock                           
           1.84%        ptlrpcd_7  [kernel.kallsyms]        [k] _spin_lock                           
           1.84%       ptlrpcd_10  [kernel.kallsyms]        [k] _spin_lock                           
           1.83%       ptlrpcd_15  [kernel.kallsyms]        [k] _spin_lock                           
           1.83%        ptlrpcd_4  [kernel.kallsyms]        [k] _spin_lock                           
           1.82%        ptlrpcd_3  [kernel.kallsyms]        [k] _spin_lock                           
           1.79%       ptlrpcd_11  [kernel.kallsyms]        [k] _spin_lock                           
           1.78%       ptlrpcd_12  [kernel.kallsyms]        [k] _spin_lock                           
           1.74%        ptlrpcd_9  [kernel.kallsyms]        [k] _spin_lock                           
           1.55%              IOR  mca_ghc_sm.so            [.] mca_ghc_sm_barrier_D                 
           1.51%              IOR  [kernel.kallsyms]        [k] _spin_lock                           
           1.00%              IOR  mca_ghc_sm.so            [.] mca_ghc_sm_barrier_U              
       
      
      case 1) read
      ==========
           3.26%             init  [kernel.kallsyms]         [k] poll_idle                            
           3.07%       ptlrpcd_13  [kernel.kallsyms]         [k] _spin_lock                           
           3.05%        ptlrpcd_0  [kernel.kallsyms]         [k] _spin_lock                           
           3.00%        ptlrpcd_9  [kernel.kallsyms]         [k] _spin_lock                           
           3.00%       ptlrpcd_12  [kernel.kallsyms]         [k] _spin_lock                           
           2.97%        ptlrpcd_2  [kernel.kallsyms]         [k] _spin_lock                           
           2.96%       ptlrpcd_14  [kernel.kallsyms]         [k] _spin_lock                           
           2.94%        ptlrpcd_5  [kernel.kallsyms]         [k] _spin_lock                           
           2.93%        ptlrpcd_8  [kernel.kallsyms]         [k] _spin_lock                           
           2.93%       ptlrpcd_10  [kernel.kallsyms]         [k] _spin_lock                           
           2.92%        ptlrpcd_1  [kernel.kallsyms]         [k] _spin_lock                           
           2.91%       ptlrpcd_15  [kernel.kallsyms]         [k] _spin_lock                           
           2.91%        ptlrpcd_6  [kernel.kallsyms]         [k] _spin_lock                           
           2.91%        ptlrpcd_3  [kernel.kallsyms]         [k] _spin_lock                           
           2.89%        ptlrpcd_4  [kernel.kallsyms]         [k] _spin_lock                           
           2.87%       ptlrpcd_11  [kernel.kallsyms]         [k] _spin_lock                           
           2.75%        ptlrpcd_7  [kernel.kallsyms]         [k] _spin_lock                           
           2.57%              IOR  [kernel.kallsyms]         [k] copy_user_generic_string             
           1.24%              IOR  [kernel.kallsyms]         [k] _spin_lock                           
           1.07%              IOR  [osc]                     [k] osc_page_init                        
      
      
      case 2) write
      ==========
           5.68%              IOR  [kernel.kallsyms]         [k] copy_user_generic_string             
           5.51%              IOR  [kernel.kallsyms]         [k] _spin_lock                           
           3.22%              IOR  [kernel.kallsyms]         [k] _spin_lock_irqsave                   
           1.56%              IOR  [kernel.kallsyms]         [k] memset                               
           1.33%              IOR  [kernel.kallsyms]         [k] _spin_lock_irq                       
           1.32%              IOR  [obdclass]                [k] cl_page_alloc                        
           1.16%              IOR  [obdclass]                [k] cl_object_top                        
           1.15%              IOR  [kernel.kallsyms]         [k] _spin_trylock                        
           1.09%              IOR  [osc]                     [k] osc_queue_async_io                   
           1.08%              IOR  [obdclass]                [k] cl_page_delete0                      
           1.06%              IOR  [kernel.kallsyms]         [k] mark_page_accessed                   
           0.99%              IOR  [kernel.kallsyms]         [k] radix_tree_delete                    
           0.98%              IOR  [libcfs]                  [k] cfs_hash_rw_unlock                   
           0.90%              IOR  [kernel.kallsyms]         [k] __list_add                           
           0.88%              IOR  [kernel.kallsyms]         [k] radix_tree_insert                    
           0.85%              IOR  [kernel.kallsyms]         [k] list_del                             
           0.84%              IOR  [kernel.kallsyms]         [k] get_page_from_freelist               
           0.82%              IOR  [kernel.kallsyms]         [k] __mem_cgroup_commit_charge           
           0.80%              IOR  [osc]                     [k] __osc_lru_del                        
           0.76%              IOR  [osc]                     [k] osc_enter_cache_try.clone.0          
      
      
      case 2) read
      ==========
           5.86%              IOR  [kernel.kallsyms]           [k] _spin_lock                           
           5.37%              IOR  [kernel.kallsyms]           [k] copy_user_generic_string             
           3.50%              IOR  [kernel.kallsyms]           [k] _spin_lock_irqsave                   
           3.14%              IOR  mca_ghc_sm.so               [.] mca_ghc_sm_reduce_U                  
           3.12%              IOR  mca_ghc_sm.so               [.] mca_ghc_sm_bcast_D                   
           2.45%              IOR  [osc]                       [k] osc_page_init                        
           1.95%              IOR  [kernel.kallsyms]           [k] memset                               
           1.37%              IOR  [obdclass]                  [k] cl_page_get_trust                    
           1.30%              IOR  [lustre]                    [k] vvp_io_read_page                     
           1.27%              IOR  [obdclass]                  [k] lprocfs_counter_add                  
           1.16%              IOR  [obdclass]                  [k] cl_page_delete0                      
           1.16%              IOR  [obdclass]                  [k] cl_page_invoid                       
           1.15%              IOR  [kernel.kallsyms]           [k] radix_tree_delete                    
           1.10%              IOR  [kernel.kallsyms]           [k] __list_add                           
           1.06%              IOR  [lustre]                    [k] ll_readahead                         
           1.05%              IOR  [kernel.kallsyms]           [k] put_page                             
           1.00%              IOR  [kernel.kallsyms]           [k] _spin_lock_irq                       
           0.96%              IOR  [kernel.kallsyms]           [k] list_del                             
           0.96%              IOR  [kernel.kallsyms]           [k] get_page_from_freelist               
           0.94%              IOR  [obdclass]                  [k] lu_context_key_get                   
      

      The call stack of ptlrpcd threads in case 1) looks like this.

           1.96%        ptlrpcd_1  [kernel.kallsyms]        [k] _spin_lock
                        |
                        --- _spin_lock
                           |
                           |--45.80%-- cl_env_put
                           |          ll_invalidatepage
                           |          vvp_page_discard
                           |          cl_page_invoid
                           |          cl_page_discard
                           |          discard_pagevec
                           |          osc_lru_shrink
                           |          lru_queue_work
                           |          work_interpreter
                           |          ptlrpc_check_set
                           |          ptlrpcd_check
                           |          ptlrpcd
                           |          kthread
                           |          child_rip
                           |
                           |--42.29%-- cl_env_get
                           |          ll_invalidatepage
                           |          vvp_page_discard
                           |          cl_page_invoid
                           |          cl_page_discard
                           |          discard_pagevec
                           |          osc_lru_shrink
                           |          lru_queue_work
                           |          work_interpreter
                           |          ptlrpc_check_set
                           |          ptlrpcd_check
                           |          ptlrpcd
                           |          kthread
                           |          child_rip
                           |
      

      Attachments

        Activity

          People

            jay Jinshan Xiong (Inactive)
            pichong Gregoire Pichon
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: