Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.4.0
-
3
-
5764
Description
vmalloc based allocations can potentially take a very long time to complete due to a regression in the kernel. As a result, I've seen our MDS "lock up" for certain periods of time while all of the cores spin on the vmap_area_lock down in ptlrpc_alloc_rqbd.
For example:
2012-11-01 11:34:28 Pid: 34505, comm: mdt02_051 2012-11-01 11:34:28 2012-11-01 11:34:28 Call Trace: 2012-11-01 11:34:28 [<ffffffff81273155>] ? rb_insert_color+0x125/0x160 2012-11-01 11:34:28 [<ffffffff81149f1f>] ? __vmalloc_area_node+0x5f/0x190 2012-11-01 11:34:28 [<ffffffff810609ea>] __cond_resched+0x2a/0x40 2012-11-01 11:34:28 [<ffffffff814efa60>] _cond_resched+0x30/0x40 2012-11-01 11:34:28 [<ffffffff8115fa88>] kmem_cache_alloc_node_notrace+0xa8/0x130 2012-11-01 11:34:28 [<ffffffff8115fc8b>] __kmalloc_node+0x7b/0x100 2012-11-01 11:34:28 [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs] 2012-11-01 11:34:28 [<ffffffff81149f1f>] __vmalloc_area_node+0x5f/0x190 2012-11-01 11:34:28 [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs] 2012-11-01 11:34:28 [<ffffffff81149eb2>] __vmalloc_node+0xa2/0xb0 2012-11-01 11:34:28 [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs] 2012-11-01 11:34:28 [<ffffffff8114a199>] vmalloc_node+0x29/0x30 2012-11-01 11:34:28 [<ffffffffa05a2a40>] cfs_cpt_vmalloc+0x20/0x30 [libcfs] 2012-11-01 11:34:28 [<ffffffffa0922ffe>] ptlrpc_alloc_rqbd+0x13e/0x690 [ptlrpc] 2012-11-01 11:34:28 [<ffffffffa09235b5>] ptlrpc_grow_req_bufs+0x65/0x1b0 [ptlrpc] 2012-11-01 11:34:28 [<ffffffffa0927fbd>] ptlrpc_main+0xd0d/0x19f0 [ptlrpc] 2012-11-01 11:34:28 [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc] 2012-11-01 11:34:28 [<ffffffff8100c14a>] child_rip+0xa/0x20 2012-11-01 11:34:28 [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc] 2012-11-01 11:34:28 [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc] 2012-11-01 11:34:28 [<ffffffff8100c140>] ? child_rip+0x0/0x20
Here's a couple links regarding the kernel regression:
I think we might not care one thread (or very few threads) spinning, because each service has tens or even hundreds of threads, and servers normally have many CPU cores, all other threads can serve requests, they will not wait for buffer allocating at all.
The key issue of this ticket is vmalloc can't be parallelized, so it's a waste if all threads/CPUs try to allocate buffers at the same time.