Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6251

Melanox / O2ib lnd cause a OOM on OST node

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • None
    • Lustre 2.5.1
    • None
    • 2.5.1 based Lustre code.
    • 3
    • 17504

    Description

      Due investigation an OOM on node, we found a large number allocations done with 532480 and 266240 bytes size.

      Example of vm_struct for memory region with size 266240:
      crash> vm_struct ffff880019c542c0
      struct vm_struct {
        next = 0xffff880588f29900, 
        addr = 0xffffc904a626d000, 
        size = 266240, 
        flags = 4, 
        pages = 0x0, 
        nr_pages = 0, 
        phys_addr = 0, 
        caller = 0xffffffffa00b7136 <mlx4_buf_alloc+870>
      }
      

      99% of memory regions with size 266240 and 523480 has caller = 0xffffffffa00b7136 <mlx4_buf_alloc+870>.

      number a regions is 31042 / 31296.
      I found strange backtraces in kernel

      PID: 83859  TASK: ffff8807d64ca040  CPU: 0   COMMAND: "kiblnd_connd"
       #0 [ffff8807b2835a90] schedule at ffffffff815253c0
       #1 [ffff8807b2835b58] schedule_timeout at ffffffff815262a5
       #2 [ffff8807b2835c08] wait_for_common at ffffffff81525f23
       #3 [ffff8807b2835c98] wait_for_completion at ffffffff8152603d
       #4 [ffff8807b2835ca8] synchronize_sched at ffffffff81096e88
       #5 [ffff8807b2835cf8] mlx4_cq_free at ffffffffa00bf188 [mlx4_core]
       #6 [ffff8807b2835d68] mlx4_ib_destroy_cq at ffffffffa04725f5 [mlx4_ib]
       #7 [ffff8807b2835d88] ib_destroy_cq at ffffffffa043de99 [ib_core]
       #8 [ffff8807b2835d98] kiblnd_destroy_conn at ffffffffa0acbafc [ko2iblnd]
       #9 [ffff8807b2835dd8] kiblnd_connd at ffffffffa0ad5fe1 [ko2iblnd]
      #10 [ffff8807b2835ee8] kthread at ffffffff8109ac66
      #11 [ffff8807b2835f48] kernel_thread at ffffffff8100c20a
      

      so thread blocked with something while destroy an ib connection.
      inspecting a task

      crash> p ((struct task_struct *)0xffff8807d64ca040)->se.cfs_rq->rq->clock
      $25 = 230339336880160
       crash> p ((struct task_struct *)0xffff8807d64ca040)->se.block_start
      $26 = 230337329685261
       >>> (230339336880160-230337329685261)/10**9
      2
      

      but more interested in an o2ib lnd statistic i found

      crash> kib_net 0xffff8808325e9dc0
      struct kib_net {
        ibn_list = {
          next = 0xffff8807b40a2f40, 
          prev = 0xffff8807b40a2f40
        }, 
        ibn_incarnation = 1423478059211439, 
        ibn_init = 2, 
        ibn_shutdown = 0, 
        ibn_npeers = {
          counter = 31042
        }, 
        ibn_nconns = {
          counter = 31041
        },
      

      so 31k peers - but tests are run on cluster with 14 real clients and 5 server nodes, so isn't more 20 connections exist.
      but where it placed?

      crash> p &kiblnd_data.kib_connd_zombies
      $7 = (struct list_head *) 0xffffffffa0ae7e70 <kiblnd_data+112>
      crash> list -H 0xffffffffa0ae7e70 -o kib_conn.ibc_list | wc -l
      31030
      

      so all memory consumed with zombi which need more than 2s to destroy.

      Attachments

        Activity

          People

            wc-triage WC Triage
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: