Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4271

mds load goes very high and filesystem hangs after mounting mdt

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.1.5
    • None
    • 4
    • 11731

    Description

      after recover of a crashed mds the system load goes to >800
      Filesystem is DOWN. We need help to bring the filesystem online!

      here is the error
      Lustre: Skipped 2 previous similar messages
      Lustre: Service thread pid 7014 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 7014, comm: mdt_01

      Call Trace:
      [<ffffffff8151d552>] schedule_timeout+0x192/0x2e0
      [<ffffffff8107bf80>] ? process_timeout+0x0/0x10
      [<ffffffffa04e45e1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      [<ffffffffa0da2508>] osc_create+0x528/0xdc0 [osc]
      [<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
      [<ffffffffa0e13337>] lov_check_and_create_object+0x187/0x570 [lov]
      [<ffffffffa0e13a1b>] qos_remedy_create+0x1db/0x220 [lov]
      [<ffffffffa0e1059a>] lov_fini_create_set+0x24a/0x1200 [lov]
      [<ffffffffa0dfa0f2>] lov_create+0x792/0x1400 [lov]
      [<ffffffffa11000d6>] ? mdd_get_md+0x96/0x2f0 [mdd]
      [<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
      [<ffffffffa1120916>] ? mdd_read_unlock+0x26/0x30 [mdd]
      [<ffffffffa110490e>] mdd_lov_create+0x9ee/0x1ba0 [mdd]
      [<ffffffffa1116871>] mdd_create+0xf81/0x1a90 [mdd]
      [<ffffffffa121edf3>] ? osd_oi_lookup+0x83/0x110 [osd_ldiskfs]
      [<ffffffffa121956c>] ? osd_object_init+0xdc/0x3e0 [osd_ldiskfs]
      [<ffffffffa124f3f7>] cml_create+0x97/0x250 [cmm]
      [<ffffffffa118b5e1>] ? mdt_version_get_save+0x91/0xd0 [mdt]
      [<ffffffffa11a106e>] mdt_reint_open+0x1aae/0x28a0 [mdt]
      [<ffffffffa077a724>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
      [<ffffffffa111956e>] ? md_ucred+0x1e/0x60 [mdd]
      [<ffffffffa1189c81>] mdt_reint_rec+0x41/0xe0 [mdt]
      [<ffffffffa1180ed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
      [<ffffffffa118153d>] mdt_intent_reint+0x1ed/0x530 [mdt]
      [<ffffffffa117fc09>] mdt_intent_policy+0x379/0x690 [mdt]
      [<ffffffffa0736351>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      [<ffffffffa075c1ad>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
      [<ffffffffa1180586>] mdt_enqueue+0x46/0x130 [mdt]
      [<ffffffffa1175772>] mdt_handle_common+0x932/0x1750 [mdt]
      [<ffffffffa1176665>] mdt_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa078ab4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
      [<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      Attachments

        Issue Links

          Activity

            People

              bfaccini Bruno Faccini (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: