Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1784

freeing cached clean pages is slow

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.3.0, Lustre 2.4.0
    • 3
    • 10135

    Description

      Compared to NFS it seems to take Lustre a long time to free clean pages from the cache. Speeding this up may help client performance on a number of fronts since it's such a common operation. Giving up memory more quickly may also improve client stability under low memory conditions.

      For example, if I cat 50 GB of data to /dev/null from NFS, it takes 3.7 seconds to drop caches.

      # grove617 /root > dd if=/var/dumps/bigfile of=/dev/null bs=1M > /dev/null 2>&1
      # grove617 /root > grep ^Cached /proc/meminfo ;  time echo 3 > /proc/sys/vm/drop_caches ; grep ^Cached /proc/meminfo 
      Cached:         49492808 kB
      
      real    0m3.707s
      user    0m0.000s
      sys     0m3.696s
      Cached:            59436 kB
      

      A similar test form Lustre takes about 7 times as long.

      # grove617 /root > for x in `seq 129 154` ; do dd if=/p/lstest/bass6/fio/read.$x.0 of=/dev/null bs=1M >/dev/null 2>&1 ; done                                                           
      # grove617 /root > grep ^Cached /proc/meminfo ;  time echo 3 > /proc/sys/vm/drop_caches ; grep ^Cached /proc/meminfo 
      Cached:         47703020 kB
      
      real    0m26.961s
      user    0m0.000s
      sys     0m26.870s
      Cached:            59768 kB
      

      Oprofile data for the Lustre test:

      vma      samples  cum. samples  %        cum. %     linenr info                 app name                 symbol name
      ffffffff8115ff60 60729    60729         10.3025  10.3025    slab.c:3836                 vmlinux                  kmem_cache_free
      0000000000001810 37739    98468          6.4023  16.7048    lvfs_lib.c:111              lvfs.ko                  lprocfs_counter_sub
      ffffffff81272930 25490    123958         4.3243  21.0290    radix-tree.c:1237           vmlinux                  radix_tree_delete
      0000000000015cc0 21352    145310         3.6223  24.6513    osc_page.c:767              osc.ko                   osc_lru_del
      0000000000000000 21289    166599         3.6116  28.2629    (no location information)   lustre                   /lustre
      0000000000051030 18837    185436         3.1956  31.4586    cl_page.c:719               obdclass.ko              cl_vmpage_page
      ffffffff8127d650 18123    203559         3.0745  34.5331    list_debug.c:45             vmlinux                  list_del
      ffffffff811602a0 16864    220423         2.8609  37.3940    slab.c:3516                 vmlinux                  free_block
      0000000000051d00 16518    236941         2.8022  40.1962    cl_page.c:284               obdclass.ko              cl_page_free
      0000000000051200 15149    252090         2.5700  42.7662    cl_page.c:1124              obdclass.ko              cl_page_delete0
      000000000004eb90 14321    266411         2.4295  45.1957    cl_page.c:1236              obdclass.ko              cl_page_export
      0000000000052150 13936    280347         2.3642  47.5599    cl_page.c:651               obdclass.ko              cl_page_put
      ffffffff8126c5e0 13737    294084         2.3304  49.8903    dec_and_lock.c:21           vmlinux                  _atomic_dec_and_lock
      ffffffff8127d6f0 11844    305928         2.0093  51.8996    list_debug.c:22             vmlinux                  __list_add
      

      and for NFS:

      vma      samples  cum. samples  %        cum. %     linenr info                 app name                 symbol name
      ffffffff8116b0f0 9239     9239           9.6682   9.6682    memcontrol.c:2362           vmlinux                  __mem_cgroup_uncharge_common
      ffffffff81129490 7402     16641          7.7458  17.4140    vmscan.c:427                vmlinux                  __remove_mapping
      ffffffff811243d0 7250     23891          7.5868  25.0008    page_alloc.c:1152           vmlinux                  free_hot_cold_page
      ffffffff811234f0 6747     30638          7.0604  32.0612    page_alloc.c:592            vmlinux                  free_pcppages_bulk
      ffffffff81127af0 5401     36039          5.6519  37.7131    swap.c:438                  vmlinux                  release_pages
      ffffffff81272930 5302     41341          5.5483  43.2614    radix-tree.c:1237           vmlinux                  radix_tree_delete
      ffffffff811330d0 4319     45660          4.5196  47.7810    vmstat.c:282                vmlinux                  __dec_zone_state
      ffffffff811284b0 4049     49709          4.2371  52.0181    truncate.c:173              vmlinux                  invalidate_inode_page
      ffffffff81110a20 3666     53375          3.8363  55.8544    filemap.c:794               vmlinux                  find_get_pages
      ffffffff8110fdb0 3601     56976          3.7683  59.6226    filemap.c:527               vmlinux                  page_waitqueue
      ffffffff8127d650 3367     60343          3.5234  63.1461    list_debug.c:45             vmlinux                  list_del
      ffffffff81110e10 3007     63350          3.1467  66.2927    filemap.c:579               vmlinux                  unlock_page
      ffffffff81167cd0 2976     66326          3.1142  69.4070    memcontrol.c:731            vmlinux                  mem_cgroup_del_lru_list
      ffffffff81120e00 2883     69209          3.0169  72.4239    page_alloc.c:5350           vmlinux                  get_pageblock_flags_group
      ffffffff81128720 2686     71895          2.8108  75.2347    truncate.c:330              vmlinux                  invalidate_mapping_pages
      ffffffff81091010 2193     74088          2.2949  77.5295    wait.c:251                  vmlinux                  __wake_up_bit
      ffffffff81167440 2175     76263          2.2760  79.8056    bit_spinlock.h:11           vmlinux                  bit_spin_lock
      ffffffff81271c40 2100     78363          2.1975  82.0031    radix-tree.c:820            vmlinux                  __lookup
      ...
      ffffffff8115ff60 182      89326          0.1905  93.4754    slab.c:3836                 vmlinux                  kmem_cache_free
      

      It's interesting that kmem_cache_free accounts for 10% of CPU cycles in Lustre, but only 0.2% for NFS. It also seems we're spending a lot of time updating lprocfs counters.

      Attachments

        Activity

          People

            bobijam Zhenyu Xu
            nedbass Ned Bass (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: