Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1576

client sluggish after running lpurge

    XMLWordPrintable

Details

    • 3
    • 4566

    Description

      We periodically run lpurge on lustre clients to keep filesystem capacity usage under control. lpurge recurses through the filesystem generating a list of files that have not been accessed within some time threshold and optionally removes them.

      https://github.com/chaos/lustre-tools-llnl/blob/master/src/lpurge.c

      We have found the nodes running lpurge on a large number of files eventually become unusably slow. In some cases the node is evicted and lpurge terminates, but the slowness persists. There is noticable keyboard lag and delays starting and running processes.

      Here are some memory statistic on a slow node. In this example we see about 10G in lustre_inode_cache slab and 30G in Inactive(file). Dropping caches clears out the slabs and the node becomes responsive again. However, Inactive(file) remains unchanged.

      The backtraces below show processes stuck in the kernel shrinker, but the lustre-related slabs don't shrink unless we drop caches manually.

      # free
                   total       used       free     shared    buffers     cached
      Mem:      49416632   46140416    3276216          0     143212     749056
      -/+ buffers/cache:   45248148    4168484
      Swap:      4000232          0    4000232
      
      # slabtop -o -s c | head
      Active / Total Objects (% used)    : 21568317 / 21691269 (99.4%)
       Active / Total Slabs (% used)      : 1878088 / 1878091 (100.0%)
       Active / Total Caches (% used)     : 134 / 231 (58.0%)
       Active / Total Size (% used)       : 11945321.57K / 11964171.77K (99.8%)
       Minimum / Average / Maximum Object : 0.02K / 0.55K / 4096.00K
      
        OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
      9035425 9035342  99%    1.06K 1290775        7  10326200K lustre_inode_cache
      5804960 5804471  99%    0.19K 290248       20   1160992K dentry
      5667330 5659606  99%    0.12K 188911       30    755644K size-128
       33005  33005 100%    8.00K  33005        1    264040K size-8192
      141100 140027  99%    0.78K  28220        5    112880K ext3_inode_cache
      406687 400332  98%    0.06K   6893       59     27572K size-64
      232619 156570  67%    0.10K   6287       37     25148K buffer_head
       22296  21202  95%    1.00K   5574        4     22296K size-1024
        9336   9301  99%    2.00K   4668        2     18672K size-2048
       28217  21235  75%    0.55K   4031        7     16124K radix_tree_node
       74500  74356  99%    0.19K   3725       20     14900K size-192
         230    230 100%   32.12K    230        1     14720K kmem_cache
       20128  19625  97%    0.50K   2516        8     10064K size-512
        1161   1161 100%    6.65K   1161        1      9288K ll_obd_dev_cache
      

      /proc/meminfo before and after 'echo 3 > /proc/sys/vm/drop_caches'

      Before drop_caches            After drop_caches
      MemTotal:       49416632 kB   MemTotal:       49416632 kB
      MemFree:         3195016 kB   MemFree:        16576276 kB
      Buffers:          143724 kB   Buffers:             416 kB
      Cached:           836660 kB   Cached:            12572 kB
      SwapCached:            0 kB   SwapCached:            0 kB
      Active:           473304 kB   Active:            30836 kB
      Inactive:       31535004 kB   Inactive:       31010004 kB
      Active(anon):      22280 kB   Active(anon):      22356 kB
      Inactive(anon):     1304 kB   Inactive(anon):     1304 kB
      Active(file):     451024 kB   Active(file):       8480 kB
      Inactive(file): 31533700 kB   Inactive(file): 31008700 kB
      Unevictable:           0 kB   Unevictable:           0 kB
      Mlocked:               0 kB   Mlocked:               0 kB
      SwapTotal:       4000232 kB   SwapTotal:       4000232 kB
      SwapFree:        4000232 kB   SwapFree:        4000232 kB
      Dirty:                 4 kB   Dirty:                 0 kB
      Writeback:             0 kB   Writeback:             0 kB
      AnonPages:         23468 kB   AnonPages:         23472 kB
      Mapped:            11988 kB   Mapped:            11992 kB
      Shmem:               192 kB   Shmem:               192 kB
      Slab:           12823932 kB   Slab:             409052 kB
      SReclaimable:    1327712 kB   SReclaimable:      12436 kB
      SUnreclaim:     11496220 kB   SUnreclaim:       396616 kB
      KernelStack:        2768 kB   KernelStack:        2768 kB
      PageTables:         3256 kB   PageTables:         3256 kB
      NFS_Unstable:          0 kB   NFS_Unstable:          0 kB
      Bounce:                0 kB   Bounce:                0 kB
      WritebackTmp:          0 kB   WritebackTmp:          0 kB
      CommitLimit:    28708548 kB   CommitLimit:    28708548 kB
      Committed_AS:     135712 kB   Committed_AS:     135708 kB
      VmallocTotal:   34359738367 kBVmallocTotal:   34359738367 kB
      VmallocUsed:     1180768 kB   VmallocUsed:     1180768 kB
      VmallocChunk:   34332553664 kBVmallocChunk:   34332553664 kB
      HardwareCorrupted:     0 kB   HardwareCorrupted:     0 kB
      AnonHugePages:         0 kB   AnonHugePages:         0 kB
      HugePages_Total:       0      HugePages_Total:       0
      HugePages_Free:        0      HugePages_Free:        0
      HugePages_Rsvd:        0      HugePages_Rsvd:        0
      HugePages_Surp:        0      HugePages_Surp:        0
      Hugepagesize:       2048 kB   Hugepagesize:       2048 kB
      DirectMap4k:        5312 kB   DirectMap4k:        5312 kB
      DirectMap2M:     2082816 kB   DirectMap2M:     2082816 kB
      DirectMap1G:    48234496 kB   DirectMap1G:    48234496 kB
      

      Finally, sysrq-l backtraces from example slow processes show them in shrink_inactive_list:

      Process in.mrlogind
      
      isolate_pages_global
      shrink_inactive_list
      shrink_zone
      zone_reclaim
      get_page_from_freelist
      __alloc_pages_nodemask
      kmem_getpages
      cache_grow
      cache_alloc_refill
      kmem_cache_alloc
      __alloc_skb
      sk_stream_alloc_skb
      tcp_sendmsg
      sock_aio_write
      do_sync_write
      vfs_write
      sys_write
      system_call_fastpath
      
      Process opcontrol
      
      __isolate_lru_page
      isolate_pages_global
      shrink_inactive_list
      shrink_zone
      zone_reclaim
      isolate_pages_global
      get_page_from_freelist
      __alloc_pages_nodemask
      alloc_pages_current
      __pte_alloc
      copy_pte_range
      kmem_getpages
      cache_grow
      cache_alloc_refill
      kmem_cache_alloc
      dup_mm
      copy_process
      do_fork
      alloc_fd
      fd_install
      sys_clone
      stub_clone
      system_call_fastpath
      

      LLNL-bugzilla-ID: 1661

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              nedbass Ned Bass (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: