Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5415

High ldlm_poold load on client

    XMLWordPrintable

Details

    • 3
    • 15059

    Description

      When LRU resizing is enabled on client, sometimes, ldlm_poold have extremely high CPU load. And at the meantime, schedule_timeout() complains about negative timeout. After some time, the problem will recover without any manual intervention. But it happens really frequently when the file system is under high load.

      top - 09:48:51 up 6 days, 11:17,  2 users,  load average: 1.00, 1.01, 1.00
      Tasks: 516 total,   2 running, 514 sleeping,   0 stopped,   0 zombie
      Cpu(s):  0.1%us,  6.4%sy,  0.0%ni, 93.4%id,  0.1%wa,  0.0%hi,  0.0%si,  0.0%st
      Mem:  65903880k total, 24300068k used, 41603812k free,   346516k buffers
      Swap: 65535992k total,        0k used, 65535992k free, 18665656k cached
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
       37976 root      20   0     0    0    0 R 99.4  0.0   2412:25 ldlm_bl_04
      
      Jul 13 12:49:30 mu01 kernel: LustreError: 11-0: lustre-OST000a-osc-ffff88080fdad800: Communicating with 10.0.2.2@o2ib, operation obd_ping failed with -107.
      Jul 13 12:49:30 mu01 kernel: Lustre: lustre-OST000a-osc-ffff88080fdad800: Connection to lustre-OST000a (at 10.0.2.2@o2ib) was lost; in progress operations using this service will wait for recovery to complete
      Jul 13 12:49:30 mu01 kernel: LustreError: 167-0: lustre-OST000a-osc-ffff88080fdad800: This client was evicted by lustre-OST000a; in progress operations using this service will fail.
      Jul 13 12:49:31 mu01 kernel: schedule_timeout: wrong timeout value fffffffff5c2c8c0
      Jul 13 12:49:31 mu01 kernel: Pid: 4054, comm: ldlm_poold Tainted: G           ---------------  T 2.6.32-279.el6.x86_64 #1
      Jul 13 12:49:31 mu01 kernel: Call Trace:
      Jul 13 12:49:31 mu01 kernel: [<ffffffff814fe759>] ? schedule_timeout+0x2c9/0x2e0
      Jul 13 12:49:31 mu01 kernel: [<ffffffffa086612b>] ? ldlm_pool_recalc+0x10b/0x130 [ptlrpc]
      Jul 13 12:49:31 mu01 kernel: [<ffffffffa084cfb9>] ? ldlm_namespace_put+0x29/0x60 [ptlrpc]
      Jul 13 12:49:31 mu01 kernel: [<ffffffffa08670b0>] ? ldlm_pools_thread_main+0x1d0/0x2f0 [ptlrpc]
      Jul 13 12:49:31 mu01 kernel: [<ffffffff81060250>] ? default_wake_function+0x0/0x20
      Jul 13 12:49:31 mu01 kernel: [<ffffffffa0866ee0>] ? ldlm_pools_thread_main+0x0/0x2f0 [ptlrpc]
      Jul 13 12:49:31 mu01 kernel: [<ffffffff81091d66>] ? kthread+0x96/0xa0
      Jul 13 12:49:31 mu01 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
      Jul 13 12:49:31 mu01 kernel: [<ffffffff81091cd0>] ? kthread+0x0/0xa0
      Jul 13 12:49:31 mu01 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
      Jul 13 12:49:33 mu01 kernel: Lustre: lustre-OST000a-osc-ffff88080fdad800: Connection restored to lustre-OST000a (at 10.0.2.2@o2ib)
      

      Attachments

        Issue Links

          Activity

            People

              bobijam Zhenyu Xu
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: