Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4271

mds load goes very high and filesystem hangs after mounting mdt

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.1.5
    • None
    • 4
    • 11731

    Description

      after recover of a crashed mds the system load goes to >800
      Filesystem is DOWN. We need help to bring the filesystem online!

      here is the error
      Lustre: Skipped 2 previous similar messages
      Lustre: Service thread pid 7014 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
      Pid: 7014, comm: mdt_01

      Call Trace:
      [<ffffffff8151d552>] schedule_timeout+0x192/0x2e0
      [<ffffffff8107bf80>] ? process_timeout+0x0/0x10
      [<ffffffffa04e45e1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
      [<ffffffffa0da2508>] osc_create+0x528/0xdc0 [osc]
      [<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
      [<ffffffffa0e13337>] lov_check_and_create_object+0x187/0x570 [lov]
      [<ffffffffa0e13a1b>] qos_remedy_create+0x1db/0x220 [lov]
      [<ffffffffa0e1059a>] lov_fini_create_set+0x24a/0x1200 [lov]
      [<ffffffffa0dfa0f2>] lov_create+0x792/0x1400 [lov]
      [<ffffffffa11000d6>] ? mdd_get_md+0x96/0x2f0 [mdd]
      [<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
      [<ffffffffa1120916>] ? mdd_read_unlock+0x26/0x30 [mdd]
      [<ffffffffa110490e>] mdd_lov_create+0x9ee/0x1ba0 [mdd]
      [<ffffffffa1116871>] mdd_create+0xf81/0x1a90 [mdd]
      [<ffffffffa121edf3>] ? osd_oi_lookup+0x83/0x110 [osd_ldiskfs]
      [<ffffffffa121956c>] ? osd_object_init+0xdc/0x3e0 [osd_ldiskfs]
      [<ffffffffa124f3f7>] cml_create+0x97/0x250 [cmm]
      [<ffffffffa118b5e1>] ? mdt_version_get_save+0x91/0xd0 [mdt]
      [<ffffffffa11a106e>] mdt_reint_open+0x1aae/0x28a0 [mdt]
      [<ffffffffa077a724>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
      [<ffffffffa111956e>] ? md_ucred+0x1e/0x60 [mdd]
      [<ffffffffa1189c81>] mdt_reint_rec+0x41/0xe0 [mdt]
      [<ffffffffa1180ed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
      [<ffffffffa118153d>] mdt_intent_reint+0x1ed/0x530 [mdt]
      [<ffffffffa117fc09>] mdt_intent_policy+0x379/0x690 [mdt]
      [<ffffffffa0736351>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
      [<ffffffffa075c1ad>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
      [<ffffffffa1180586>] mdt_enqueue+0x46/0x130 [mdt]
      [<ffffffffa1175772>] mdt_handle_common+0x932/0x1750 [mdt]
      [<ffffffffa1176665>] mdt_regular_handle+0x15/0x20 [mdt]
      [<ffffffffa078ab4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
      [<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffff8100c0ca>] child_rip+0xa/0x20
      [<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
      [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

      Attachments

        Issue Links

          Activity

            [LU-4271] mds load goes very high and filesystem hangs after mounting mdt

            Sure Bob, this last thread should be the one blocking all the others in JBD2 layer! And again, and like in one of my original updates on 19/Nov/2013, the concerned device is "dm-3-8" that Mahmoud confirmed to be the MDT.

            And yes, it looks like a dup of LU-4794, but also earlier tickets that have been simply closed due to no new occurrence ...

            What would be cool now would be to identify if this last thread has been scheduled recently, if not why or if yes, why it is looping+re-schedule()ing (t_updates != NULL?) ...

            bfaccini Bruno Faccini (Inactive) added a comment - Sure Bob, this last thread should be the one blocking all the others in JBD2 layer! And again, and like in one of my original updates on 19/Nov/2013, the concerned device is "dm-3-8" that Mahmoud confirmed to be the MDT. And yes, it looks like a dup of LU-4794 , but also earlier tickets that have been simply closed due to no new occurrence ... What would be cool now would be to identify if this last thread has been scheduled recently, if not why or if yes, why it is looping+re-schedule()ing (t_updates != NULL?) ...

            another odd outlier thread:

            PID: 6425   TASK: ffff883fdc6c9500  CPU: 5   COMMAND: "jbd2/dm-3-8"
             #0 [ffff883fec013c60] schedule at ffffffff8151c712
             #1 [ffff883fec013d28] jbd2_journal_commit_transaction at ffffffffa055d8df [jbd2]
             #2 [ffff883fec013e68] kjournald2 at ffffffffa05646c8 [jbd2]
             #3 [ffff883fec013ee8] kthread at ffffffff8108fb96
             #4 [ffff883fec013f48] kernel_thread at ffffffff8100c0ca
            

            I'm wondering if this might be a dup of LU-4794

            bogl Bob Glossman (Inactive) added a comment - another odd outlier thread: PID: 6425 TASK: ffff883fdc6c9500 CPU: 5 COMMAND: "jbd2/dm-3-8" #0 [ffff883fec013c60] schedule at ffffffff8151c712 #1 [ffff883fec013d28] jbd2_journal_commit_transaction at ffffffffa055d8df [jbd2] #2 [ffff883fec013e68] kjournald2 at ffffffffa05646c8 [jbd2] #3 [ffff883fec013ee8] kthread at ffffffff8108fb96 #4 [ffff883fec013f48] kernel_thread at ffffffff8100c0ca I'm wondering if this might be a dup of LU-4794

            I note that in addition to the 500+ threads stuck in jbd2_journal_start, there's one thread stuck elsewhere it jbd2 code. The stack trace of this thread looks like:

            PID: 7885   TASK: ffff881e506e2ae0  CPU: 6   COMMAND: "mdt_439"
             #0 [ffff881e506f38c0] schedule at ffffffff8151c712
             #1 [ffff881e506f3988] jbd2_log_wait_commit at ffffffffa0564325 [jbd2]
             #2 [ffff881e506f3a18] jbd2_journal_stop at ffffffffa055bacb [jbd2]
             #3 [ffff881e506f3a78] __ldiskfs_journal_stop at ffffffffa08e62a8 [ldiskfs]
             #4 [ffff881e506f3aa8] osd_trans_stop at ffffffffa0d26476 [osd_ldiskfs]
             #5 [ffff881e506f3ad8] mdd_trans_stop at ffffffffa0c2d4aa [mdd]
             #6 [ffff881e506f3ae8] mdd_attr_set at ffffffffa0c0ca5f [mdd]
             #7 [ffff881e506f3bc8] cml_attr_set at ffffffffa0d4fa76 [cmm]
             #8 [ffff881e506f3bf8] mdt_attr_set at ffffffffa0c9ec68 [mdt]
             #9 [ffff881e506f3c48] mdt_reint_setattr at ffffffffa0c9f2b5 [mdt]
            #10 [ffff881e506f3cd8] mdt_reint_rec at ffffffffa0c98c81 [mdt]
            #11 [ffff881e506f3cf8] mdt_reint_internal at ffffffffa0c8fed4 [mdt]
            #12 [ffff881e506f3d48] mdt_reint at ffffffffa0c902b4 [mdt]
            #13 [ffff881e506f3d68] mdt_handle_common at ffffffffa0c84772 [mdt]
            #14 [ffff881e506f3db8] mdt_regular_handle at ffffffffa0c85665 [mdt]
            #15 [ffff881e506f3dc8] ptlrpc_main at ffffffffa079fb4e [ptlrpc]
            #16 [ffff881e506f3f48] kernel_thread at ffffffff8100c0ca
            

            I can't see how this thread might be blocking all the others, but it is interesting that it's the only one I have found so far in similar code but different from all the others.

            bogl Bob Glossman (Inactive) added a comment - I note that in addition to the 500+ threads stuck in jbd2_journal_start, there's one thread stuck elsewhere it jbd2 code. The stack trace of this thread looks like: PID: 7885 TASK: ffff881e506e2ae0 CPU: 6 COMMAND: "mdt_439" #0 [ffff881e506f38c0] schedule at ffffffff8151c712 #1 [ffff881e506f3988] jbd2_log_wait_commit at ffffffffa0564325 [jbd2] #2 [ffff881e506f3a18] jbd2_journal_stop at ffffffffa055bacb [jbd2] #3 [ffff881e506f3a78] __ldiskfs_journal_stop at ffffffffa08e62a8 [ldiskfs] #4 [ffff881e506f3aa8] osd_trans_stop at ffffffffa0d26476 [osd_ldiskfs] #5 [ffff881e506f3ad8] mdd_trans_stop at ffffffffa0c2d4aa [mdd] #6 [ffff881e506f3ae8] mdd_attr_set at ffffffffa0c0ca5f [mdd] #7 [ffff881e506f3bc8] cml_attr_set at ffffffffa0d4fa76 [cmm] #8 [ffff881e506f3bf8] mdt_attr_set at ffffffffa0c9ec68 [mdt] #9 [ffff881e506f3c48] mdt_reint_setattr at ffffffffa0c9f2b5 [mdt] #10 [ffff881e506f3cd8] mdt_reint_rec at ffffffffa0c98c81 [mdt] #11 [ffff881e506f3cf8] mdt_reint_internal at ffffffffa0c8fed4 [mdt] #12 [ffff881e506f3d48] mdt_reint at ffffffffa0c902b4 [mdt] #13 [ffff881e506f3d68] mdt_handle_common at ffffffffa0c84772 [mdt] #14 [ffff881e506f3db8] mdt_regular_handle at ffffffffa0c85665 [mdt] #15 [ffff881e506f3dc8] ptlrpc_main at ffffffffa079fb4e [ptlrpc] #16 [ffff881e506f3f48] kernel_thread at ffffffff8100c0ca I can't see how this thread might be blocking all the others, but it is interesting that it's the only one I have found so far in similar code but different from all the others.

            Have been looking over the crash dump from the customer. As noted in previous comments there is quite a large number of threads all stuck in jbd2_journal_start with the common calling sequence ... osd_trans_start -> ldiskfs_journal_start_sb -> jbd2_journal_start -> start_this_handle. It seems very likely there's a single other thread holding a lock or otherwise blocking all these callers of jbd2_journal_start, but so far I haven't been able to find the culprit.

            bogl Bob Glossman (Inactive) added a comment - Have been looking over the crash dump from the customer. As noted in previous comments there is quite a large number of threads all stuck in jbd2_journal_start with the common calling sequence ... osd_trans_start -> ldiskfs_journal_start_sb -> jbd2_journal_start -> start_this_handle. It seems very likely there's a single other thread holding a lock or otherwise blocking all these callers of jbd2_journal_start, but so far I haven't been able to find the culprit.

            Mahmoud, you including a large number of kernel RPMs in the tarball, can you please specify exactly which version corresponds to the kernel running on the node where the vmcore was taken?

            cliffw Cliff White (Inactive) added a comment - Mahmoud, you including a large number of kernel RPMs in the tarball, can you please specify exactly which version corresponds to the kernel running on the node where the vmcore was taken?

            Cliff,

            Email sent

            -Mahmoud

            mhanafi Mahmoud Hanafi added a comment - Cliff, Email sent -Mahmoud

            Mahmoud, apparently I missed your email, very sorry, can you re-send the information on the dump?

            cliffw Cliff White (Inactive) added a comment - Mahmoud, apparently I missed your email, very sorry, can you re-send the information on the dump?

            Cliff, I wonder if you found the same situation in the crash-dump than the one I described from the all threads stacks log ??

            bfaccini Bruno Faccini (Inactive) added a comment - Cliff, I wonder if you found the same situation in the crash-dump than the one I described from the all threads stacks log ??

            crash dump was uploaded and email was sent to cliff white.

            mhanafi Mahmoud Hanafi added a comment - crash dump was uploaded and email was sent to cliff white.

            We did take a crash dump. I will encrypt and uploaded. It will have to be accessed by US citizens only.

            mhanafi Mahmoud Hanafi added a comment - We did take a crash dump. I will encrypt and uploaded. It will have to be accessed by US citizens only.

            Did you also successfully take a crash-dump of your MDS, as I requested in my update/comment dated "25/Nov/13 4:47 PM" for this ticket, and as seem to indicate the end of the dmesg you uploaded for this latest occurrence ?? If yes, can you upload it with associated vmlinux (or kernel-debuginfo* RPMs) and Lustre modules.

            BTW, at the time you dumped all threads stacks with Alt+SysRq+T key-stroke, there are 502 threads stuck in jbd2_journal_start()/start_this_handle(), 129 threads stuck in dqacq_handler()/... because one of them is itself also stuck in the journal/jbd2 layer via lustre_commit_dquot(), so the load should be greater than 630 at that time ... But I am unable to find who is responsible to block all of these threads with the only full stack listing.

            bfaccini Bruno Faccini (Inactive) added a comment - Did you also successfully take a crash-dump of your MDS, as I requested in my update/comment dated "25/Nov/13 4:47 PM" for this ticket, and as seem to indicate the end of the dmesg you uploaded for this latest occurrence ?? If yes, can you upload it with associated vmlinux (or kernel-debuginfo* RPMs) and Lustre modules. BTW, at the time you dumped all threads stacks with Alt+SysRq+T key-stroke, there are 502 threads stuck in jbd2_journal_start()/start_this_handle(), 129 threads stuck in dqacq_handler()/... because one of them is itself also stuck in the journal/jbd2 layer via lustre_commit_dquot(), so the load should be greater than 630 at that time ... But I am unable to find who is responsible to block all of these threads with the only full stack listing.

            People

              bfaccini Bruno Faccini (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: