Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.4.0
-
3
-
9375
Description
Dropping caches (echo 3 > /proc/vm/drop_caches) occasionally does not complete on a client running SLES11SP2 when a Lustre file system is mounted. The kernel shrink_slab() function gets stuck in an infinite loop calling ldlm_pools_cli_shrink() because Lustre does not initialize the new batch field of the shrinker struct when it registers shrinkers.
cfs_set_shrinker() kmallocs a shrinker struct; it neither zero fills the struct nor explicitly sets the batch field. Occasionally the uninitialized batch value is negative. The kernel shrink_slab() function uses the batch value to control its loop around the calls to each shrinker. If the batch value is negative, the loop never terminates.
The interesting code bits are:
static inline
struct cfs_shrinker *cfs_set_shrinker(int seek, cfs_shrinker_t func)
{
struct shrinker *s;
s = kmalloc(sizeof(*s), GFP_KERNEL);
if (s == NULL)
return (NULL);
s->shrink = func;
s->seeks = seek;
register_shrinker(s);
return s;
}
kernel/include/linux/mm.h
struct shrinker {
int (*shrink)(struct shrinker *, struct shrink_control *sc);
int seeks; /* seeks to recreate an obj */
long batch; /* reclaim batch size, 0 = default */
/* These are for internal use */
struct list_head list;
long nr; /* objs pending delete */
};
unsigned long shrink_slab(struct shrink_control *shrink,
unsigned long nr_pages_scanned,
unsigned long lru_pages)
{
[skip]
long batch_size = shrinker->batch ? shrinker->batch
: SHRINK_BATCH;
[skip]
/* total_scan initialized to something positive */
while (total_scan >= batch_size) {
[skip]
shrink_ret = do_shrinker_shrink(shrinker, shrink,
batch_size);
[skip]
total_scan -= batch_size;
[skip]
}
When this problem occurs, stack traces of the hanging process look similar to the following, although the exact location along the ldlm_pools_cli_shrink() path varies.
crash> bt 8994
PID: 8994 TASK: ffff8808334747b0 CPU: 24 COMMAND: "apinit"
#0 [ffff8807f0a29c58] schedule at ffffffff81362e57
#1 [ffff8807f0a29ca0] ldlm_bl_to_thread_list at ffffffffa0470d40 [ptlrpc]
#2 [ffff8807f0a29cb0] ldlm_cancel_lru at ffffffffa046c385 [ptlrpc]
#3 [ffff8807f0a29d00] ldlm_cli_pool_shrink at ffffffffa047896d [ptlrpc]
#4 [ffff8807f0a29d40] ldlm_pool_shrink at ffffffffa0476568 [ptlrpc]
#5 [ffff8807f0a29d70] ldlm_pools_shrink at ffffffffa0477ebc [ptlrpc]
#6 [ffff8807f0a29dc0] ldlm_pools_cli_shrink at ffffffffa0477f5b [ptlrpc]
#7 [ffff8807f0a29dd0] shrink_slab at ffffffff810fec7a
#8 [ffff8807f0a29e70] drop_caches_sysctl_handler at ffffffff81164eb2
#9 [ffff8807f0a29ea0] proc_sys_call_handler at ffffffff811a08a0
#10 [ffff8807f0a29f00] proc_sys_write at ffffffff811a08c4
#11 [ffff8807f0a29f10] vfs_write at ffffffff8113cf1b
#12 [ffff8807f0a29f40] sys_write at ffffffff8113d0c5
#13 [ffff8807f0a29f80] system_call_fastpath at ffffffff8136cc2b
> crash-7.0.0> shrinker 0xffff8803fb004340
> struct shrinker {
> shrink = 0xffffffffa03c1130 <ldlm_pools_cli_shrink>,
> seeks = 2,
> batch = -4,
> list = {
> next = 0xffffffff815a6e40 <shrinker_list>,
> prev = 0xffff8803fb004398
> },
> nr = 0
> }
The Linux change that triggered this problem is:
author Dave Chinner <dchinner@redhat.com> 2011-07-08 04:14:37 (GMT)
committer Al Viro <viro@zeniv.linux.org.uk> 2011-07-20 05:44:32 (GMT)
commit e9299f5058595a655c3b207cda9635e28b9197e6 (patch)
tree b31a4dc5cab98ee1701313f45e92e583c2d76f63
parent 3567b59aa80ac4417002bf58e35dce5c777d4164 (diff)
vmscan: add customisable shrinker batch size
For shrinkers that have their own cond_resched* calls, having shrink_slab break the work down into small batches is not paticularly efficient. Add a custom batchsize field to the struct shrinker so that shrinkers can use a larger batch size if they desire. A value of zero (uninitialised) means "use the default", so behaviour is unchanged by this patch.
Note: ldlm_pools_srv_shrink() does not exhibit this problem because it always returns -1, which causes shrink_slab to break out of its loop.