Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-3641

Dropping caches on SLES11SP2 hangs in shrink_slab()/ldlm_pools_cli_shrink() path

    XMLWordPrintable

Details

    • 3
    • 9375

    Description

      Dropping caches (echo 3 > /proc/vm/drop_caches) occasionally does not complete on a client running SLES11SP2 when a Lustre file system is mounted. The kernel shrink_slab() function gets stuck in an infinite loop calling ldlm_pools_cli_shrink() because Lustre does not initialize the new batch field of the shrinker struct when it registers shrinkers.

      cfs_set_shrinker() kmallocs a shrinker struct; it neither zero fills the struct nor explicitly sets the batch field. Occasionally the uninitialized batch value is negative. The kernel shrink_slab() function uses the batch value to control its loop around the calls to each shrinker. If the batch value is negative, the loop never terminates.

      The interesting code bits are:

      static inline
      struct cfs_shrinker *cfs_set_shrinker(int seek, cfs_shrinker_t func)
      {
              struct shrinker *s; 
      
              s = kmalloc(sizeof(*s), GFP_KERNEL);
              if (s == NULL)
                      return (NULL);
      
              s->shrink = func;
              s->seeks = seek;
      
              register_shrinker(s);
      
              return s;
      }
      
      kernel/include/linux/mm.h
      struct shrinker {
              int (*shrink)(struct shrinker *, struct shrink_control *sc);
              int seeks;      /* seeks to recreate an obj */
              long batch;     /* reclaim batch size, 0 = default */
      
              /* These are for internal use */
              struct list_head list;
              long nr;        /* objs pending delete */
      };
      
      unsigned long shrink_slab(struct shrink_control *shrink,
                                unsigned long nr_pages_scanned,
                                unsigned long lru_pages)
      {
      [skip] 
                      long batch_size = shrinker->batch ? shrinker->batch
                                                        : SHRINK_BATCH;
      [skip]
                      /* total_scan initialized to something positive */
                      while (total_scan >= batch_size) {
      [skip]
                              shrink_ret = do_shrinker_shrink(shrinker, shrink,
                                                              batch_size);
      [skip]
                              total_scan -= batch_size;
      [skip] 
                      }
      

      When this problem occurs, stack traces of the hanging process look similar to the following, although the exact location along the ldlm_pools_cli_shrink() path varies.

      crash> bt 8994
      PID: 8994   TASK: ffff8808334747b0  CPU: 24  COMMAND: "apinit"
       #0 [ffff8807f0a29c58] schedule at ffffffff81362e57
       #1 [ffff8807f0a29ca0] ldlm_bl_to_thread_list at ffffffffa0470d40 [ptlrpc]
       #2 [ffff8807f0a29cb0] ldlm_cancel_lru at ffffffffa046c385 [ptlrpc]
       #3 [ffff8807f0a29d00] ldlm_cli_pool_shrink at ffffffffa047896d [ptlrpc]
       #4 [ffff8807f0a29d40] ldlm_pool_shrink at ffffffffa0476568 [ptlrpc]
       #5 [ffff8807f0a29d70] ldlm_pools_shrink at ffffffffa0477ebc [ptlrpc]
       #6 [ffff8807f0a29dc0] ldlm_pools_cli_shrink at ffffffffa0477f5b [ptlrpc]
       #7 [ffff8807f0a29dd0] shrink_slab at ffffffff810fec7a
       #8 [ffff8807f0a29e70] drop_caches_sysctl_handler at ffffffff81164eb2
       #9 [ffff8807f0a29ea0] proc_sys_call_handler at ffffffff811a08a0
      #10 [ffff8807f0a29f00] proc_sys_write at ffffffff811a08c4
      #11 [ffff8807f0a29f10] vfs_write at ffffffff8113cf1b
      #12 [ffff8807f0a29f40] sys_write at ffffffff8113d0c5
      #13 [ffff8807f0a29f80] system_call_fastpath at ffffffff8136cc2b
      
      > crash-7.0.0> shrinker 0xffff8803fb004340
      > struct shrinker {
      >   shrink = 0xffffffffa03c1130 <ldlm_pools_cli_shrink>, 
      >   seeks = 2, 
      >   batch = -4, 
      >   list = {
      >     next = 0xffffffff815a6e40 <shrinker_list>, 
      >     prev = 0xffff8803fb004398
      >   }, 
      >   nr = 0
      > }
      

      The Linux change that triggered this problem is:

      author Dave Chinner <dchinner@redhat.com> 2011-07-08 04:14:37 (GMT)
      committer Al Viro <viro@zeniv.linux.org.uk> 2011-07-20 05:44:32 (GMT)
      commit e9299f5058595a655c3b207cda9635e28b9197e6 (patch)
      tree b31a4dc5cab98ee1701313f45e92e583c2d76f63
      parent 3567b59aa80ac4417002bf58e35dce5c777d4164 (diff)
      vmscan: add customisable shrinker batch size
      For shrinkers that have their own cond_resched* calls, having shrink_slab break the work down into small batches is not paticularly efficient. Add a custom batchsize field to the struct shrinker so that shrinkers can use a larger batch size if they desire. A value of zero (uninitialised) means "use the default", so behaviour is unchanged by this patch.

      Note: ldlm_pools_srv_shrink() does not exhibit this problem because it always returns -1, which causes shrink_slab to break out of its loop.

      Attachments

        Activity

          People

            wc-triage WC Triage
            amk Ann Koehler (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: