Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.4.0
Affects Version/s: Lustre 2.4.0
Labels:
- sequoia

Severity:
3
Rank (Obsolete):
5764

Description

vmalloc based allocations can potentially take a very long time to complete due to a regression in the kernel. As a result, I've seen our MDS "lock up" for certain periods of time while all of the cores spin on the vmap_area_lock down in ptlrpc_alloc_rqbd.

For example:

    2012-11-01 11:34:28 Pid: 34505, comm: mdt02_051
    2012-11-01 11:34:28 
    2012-11-01 11:34:28 Call Trace:
    2012-11-01 11:34:28  [<ffffffff81273155>] ? rb_insert_color+0x125/0x160
    2012-11-01 11:34:28  [<ffffffff81149f1f>] ? __vmalloc_area_node+0x5f/0x190
    2012-11-01 11:34:28  [<ffffffff810609ea>] __cond_resched+0x2a/0x40
    2012-11-01 11:34:28  [<ffffffff814efa60>] _cond_resched+0x30/0x40
    2012-11-01 11:34:28  [<ffffffff8115fa88>] kmem_cache_alloc_node_notrace+0xa8/0x130
    2012-11-01 11:34:28  [<ffffffff8115fc8b>] __kmalloc_node+0x7b/0x100
    2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
    2012-11-01 11:34:28  [<ffffffff81149f1f>] __vmalloc_area_node+0x5f/0x190
    2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
    2012-11-01 11:34:28  [<ffffffff81149eb2>] __vmalloc_node+0xa2/0xb0
    2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
    2012-11-01 11:34:28  [<ffffffff8114a199>] vmalloc_node+0x29/0x30
    2012-11-01 11:34:28  [<ffffffffa05a2a40>] cfs_cpt_vmalloc+0x20/0x30 [libcfs]
    2012-11-01 11:34:28  [<ffffffffa0922ffe>] ptlrpc_alloc_rqbd+0x13e/0x690 [ptlrpc]
    2012-11-01 11:34:28  [<ffffffffa09235b5>] ptlrpc_grow_req_bufs+0x65/0x1b0 [ptlrpc]
    2012-11-01 11:34:28  [<ffffffffa0927fbd>] ptlrpc_main+0xd0d/0x19f0 [ptlrpc]
    2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
    2012-11-01 11:34:28  [<ffffffff8100c14a>] child_rip+0xa/0x20
    2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
    2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
    2012-11-01 11:34:28  [<ffffffff8100c140>] ? child_rip+0x0/0x20

Here's a couple links regarding the kernel regression:

Attachments

Issue Links

is related to

LU-2708 MDS thrashing in ptlrpc_alloc_rqbd

Resolved

LU-2424 add memory limits for ptlrpc service

Resolved

Activity

People

Assignee:: Zhenyu Xu

Reporter:: Prakash Surya (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 05/Dec/12 6:08 PM

Updated:: 18/Mar/13 9:25 AM

Resolved:: 18/Mar/13 9:25 AM