Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4195

MDT Slow with ptlrpcd using 100% cpu.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • None
    • Lustre 2.1.5
    • None
    • server running 2.1.5-2nas
      our source is at git://github.com/jlan/lustre-nas.git
    • 3
    • 11369

    Description

      mdt response very slow. Top showed ptlrpcd running at 100% cpu. Console showed errors. Was able to run debug trace. See attached files.

      Lustre: Service thread pid 7065 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 7065, comm: mdt_93

      Call Trace:
      [<ffffffff810539b2>] ? enqueue_task_fair+0x142/0x490
      [<ffffffff81096f5d>] ? sched_clock_cpu+0xcd/0x110
      [<ffffffffa04f960e>] cfs_waitq_wait+0xe/0x10 [libcfs]
      [<ffffffffa0a9f6de>] qos_statfs_update+0x7fe/0xa70 [lov]
      [<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
      [<ffffffffa0aa00fd>] alloc_qos+0x1ad/0x21a0 [lov]
      [<ffffffffa0aa306c>] qos_prep_create+0x1ec/0x2380 [lov]
      [<ffffffffa04f98be>] ? cfs_free+0xe/0x10 [libcfs]
      [<ffffffffa0a9c63a>] lov_prep_create_set+0xea/0x390 [lov]
      [<ffffffffa0a84b0c>] lov_create+0x1ac/0x1400 [lov]
      [<ffffffffa0d8a50f>] ? obd_iocontrol+0xef/0x390 [mdd]
      [<ffffffffa0d8f90e>] mdd_lov_create+0x9ee/0x1ba0 [mdd]
      [<ffffffffa0da1871>] mdd_create+0xf81/0x1a90 [mdd]
      [<ffffffffa0ea9df3>] ? osd_oi_lookup+0x83/0x110 [osd_ldiskfs]
      [<ffffffffa0ea456c>] ? osd_object_init+0xdc/0x3e0 [osd_ldiskfs]
      [<ffffffffa0eda3f7>] cml_create+0x97/0x250 [cmm]
      [<ffffffffa0e165e1>] ? mdt_version_get_save+0x91/0xd0 [mdt]
      [<ffffffffa0e2c06e>] mdt_reint_open+0x1aae/0x28a0 [mdt]
      [<ffffffffa078f724>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
      [<ffffffffa0da456e>] ? md_ucred+0x1e/0x60 [mdd]
      [<ffffffffa0e14c81>] mdt_reint_rec+0x41/0xe0 [mdt]
      [<ffffffffa0e0bed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
      [<ffffffffa0e0c53d>] mdt_intent_reint+0x1ed/0x530 [mdt]
      [<ffffffffa0e0ac09>] mdt_intent_policy+0x379/0x690 [mdt]
      [<ffffffffa074b351>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      [<ffffffffa07711ad>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
      [<ffffffffa0e0b586>] mdt_enqueue+0x46/0x130 [mdt]
      [<ffffffffa0e00772>] mdt_handle_common+0x932/0x1750 [mdt]
      [<ffffffffa0e01665>] mdt_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa079fb4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
      [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      Attachments

        1. bt.LU4195
          1.07 MB
        2. meminfo.LU4195
          33 kB

        Activity

          People

            ashehata Amir Shehata (Inactive)
            mhanafi Mahmoud Hanafi
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: