Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2432

ptlrpc_alloc_rqbd spinning on vmap_area_lock on MDS

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 3
    • 5764

    Description

      vmalloc based allocations can potentially take a very long time to complete due to a regression in the kernel. As a result, I've seen our MDS "lock up" for certain periods of time while all of the cores spin on the vmap_area_lock down in ptlrpc_alloc_rqbd.

      For example:

          2012-11-01 11:34:28 Pid: 34505, comm: mdt02_051
          2012-11-01 11:34:28 
          2012-11-01 11:34:28 Call Trace:
          2012-11-01 11:34:28  [<ffffffff81273155>] ? rb_insert_color+0x125/0x160
          2012-11-01 11:34:28  [<ffffffff81149f1f>] ? __vmalloc_area_node+0x5f/0x190
          2012-11-01 11:34:28  [<ffffffff810609ea>] __cond_resched+0x2a/0x40
          2012-11-01 11:34:28  [<ffffffff814efa60>] _cond_resched+0x30/0x40
          2012-11-01 11:34:28  [<ffffffff8115fa88>] kmem_cache_alloc_node_notrace+0xa8/0x130
          2012-11-01 11:34:28  [<ffffffff8115fc8b>] __kmalloc_node+0x7b/0x100
          2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
          2012-11-01 11:34:28  [<ffffffff81149f1f>] __vmalloc_area_node+0x5f/0x190
          2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
          2012-11-01 11:34:28  [<ffffffff81149eb2>] __vmalloc_node+0xa2/0xb0
          2012-11-01 11:34:28  [<ffffffffa05a2a40>] ? cfs_cpt_vmalloc+0x20/0x30 [libcfs]
          2012-11-01 11:34:28  [<ffffffff8114a199>] vmalloc_node+0x29/0x30
          2012-11-01 11:34:28  [<ffffffffa05a2a40>] cfs_cpt_vmalloc+0x20/0x30 [libcfs]
          2012-11-01 11:34:28  [<ffffffffa0922ffe>] ptlrpc_alloc_rqbd+0x13e/0x690 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffffa09235b5>] ptlrpc_grow_req_bufs+0x65/0x1b0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffffa0927fbd>] ptlrpc_main+0xd0d/0x19f0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffff8100c14a>] child_rip+0xa/0x20
          2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffffa09272b0>] ? ptlrpc_main+0x0/0x19f0 [ptlrpc]
          2012-11-01 11:34:28  [<ffffffff8100c140>] ? child_rip+0x0/0x20
      

      Here's a couple links regarding the kernel regression:

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              prakash Prakash Surya (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: