Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2432

ptlrpc_alloc_rqbd spinning on vmap_area_lock on MDS

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: Lustre 2.4.0
    • Fix Version/s: Lustre 2.4.0
    • Labels:
    • Severity:
      3
    • Rank (Obsolete):
      5764

      Description

      vmalloc based allocations can potentially take a very long time to complete due to a regression in the kernel. As a result, I've seen our MDS "lock up" for certain periods of time while all of the cores spin on the vmap_area_lock down in ptlrpc_alloc_rqbd.

      For example:

          2012-11-01 11:34:28 Pid: 34505, comm: mdt02_051
          2012-11-01 11:34:28 
          2012-11-01 11:34:28 Call Trace:
          2012-11-01 11:34:28  [<ffffffff81273155>] ? rb_insert_color+0x125/0x160
          2012-11-01 11:34:28  [<ffffffff81149f1f>] ? __vmalloc_area_node+0x5f/0x190
          2012-11-01 11:34:28  [<ffffffff810609ea>] __cond_resched+0x2a/0x40
          2012-11-01 11:34:28  [<ffffffff814efa60>] _cond_resched+0x30/0x40
          2012-11-01 11:34:28  [<ffffffff8115fa88>] kmem_cache_alloc_node_notrace+0xa8/0x130
          2012-11-01 11:34:28  [<ffffffff8115fc8b>] __kmalloc_node+0x7b/0x100
          2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
          2012-11-01 11:34:28  [<ffffffff81149f1f>] __vmalloc_area_node+0x5f/0x190
          2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
          2012-11-01 11:34:28  [<ffffffff81149eb2>] __vmalloc_node+0xa2/0xb0
          2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
          2012-11-01 11:34:28  [<ffffffff8114a199>] vmalloc_node+0x29/0x30
          2012-11-01 11:34:28  [<ffffffffa05a2a40>] cfs_cpt_vmalloc+0x20/0x30 [libcfs]
          2012-11-01 11:34:28  [<ffffffffa0922ffe>] ptlrpc_alloc_rqbd+0x13e/0x690 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffffa09235b5>] ptlrpc_grow_req_bufs+0x65/0x1b0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffffa0927fbd>] ptlrpc_main+0xd0d/0x19f0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffff8100c14a>] child_rip+0xa/0x20
          2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      Here's a couple links regarding the kernel regression:

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bobijam Zhenyu Xu
                Reporter:
                prakash Prakash Surya (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: