[LU-4271] mds load goes very high and filesystem hangs after mounting mdt - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Won't Fix
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.1.5
Labels:
None

Severity:
4
Rank (Obsolete):
11731

Description

after recover of a crashed mds the system load goes to >800
Filesystem is DOWN. We need help to bring the filesystem online!

here is the error
Lustre: Skipped 2 previous similar messages
Lustre: Service thread pid 7014 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 7014, comm: mdt_01

Call Trace:
[<ffffffff8151d552>] schedule_timeout+0x192/0x2e0
[<ffffffff8107bf80>] ? process_timeout+0x0/0x10
[<ffffffffa04e45e1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
[<ffffffffa0da2508>] osc_create+0x528/0xdc0 [osc]
[<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
[<ffffffffa0e13337>] lov_check_and_create_object+0x187/0x570 [lov]
[<ffffffffa0e13a1b>] qos_remedy_create+0x1db/0x220 [lov]
[<ffffffffa0e1059a>] lov_fini_create_set+0x24a/0x1200 [lov]
[<ffffffffa0dfa0f2>] lov_create+0x792/0x1400 [lov]
[<ffffffffa11000d6>] ? mdd_get_md+0x96/0x2f0 [mdd]
[<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
[<ffffffffa1120916>] ? mdd_read_unlock+0x26/0x30 [mdd]
[<ffffffffa110490e>] mdd_lov_create+0x9ee/0x1ba0 [mdd]
[<ffffffffa1116871>] mdd_create+0xf81/0x1a90 [mdd]
[<ffffffffa121edf3>] ? osd_oi_lookup+0x83/0x110 [osd_ldiskfs]
[<ffffffffa121956c>] ? osd_object_init+0xdc/0x3e0 [osd_ldiskfs]
[<ffffffffa124f3f7>] cml_create+0x97/0x250 [cmm]
[<ffffffffa118b5e1>] ? mdt_version_get_save+0x91/0xd0 [mdt]
[<ffffffffa11a106e>] mdt_reint_open+0x1aae/0x28a0 [mdt]
[<ffffffffa077a724>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
[<ffffffffa111956e>] ? md_ucred+0x1e/0x60 [mdd]
[<ffffffffa1189c81>] mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa1180ed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
[<ffffffffa118153d>] mdt_intent_reint+0x1ed/0x530 [mdt]
[<ffffffffa117fc09>] mdt_intent_policy+0x379/0x690 [mdt]
[<ffffffffa0736351>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
[<ffffffffa075c1ad>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
[<ffffffffa1180586>] mdt_enqueue+0x46/0x130 [mdt]
[<ffffffffa1175772>] mdt_handle_common+0x932/0x1750 [mdt]
[<ffffffffa1176665>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa078ab4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

nbp8-mds1-dmesg.mar25.gz
223 kB
26/Mar/14 1:50 AM
service200-20131119
1.52 MB
19/Nov/13 9:07 AM

Issue Links

is related to

LU-4335 MDS hangs due to mdt thread hung/inactive

Resolved

Activity

[LU-4271] mds load goes very high and filesystem hangs after mounting mdt

Bruno Faccini (Inactive) added a comment - 21/Apr/14 8:42 PM

Sure Bob, this last thread should be the one blocking all the others in JBD2 layer! And again, and like in one of my original updates on 19/Nov/2013, the concerned device is "dm-3-8" that Mahmoud confirmed to be the MDT.

And yes, it looks like a dup of ~~LU-4794~~, but also earlier tickets that have been simply closed due to no new occurrence ...

What would be cool now would be to identify if this last thread has been scheduled recently, if not why or if yes, why it is looping+re-schedule()ing (t_updates != NULL?) ...

Bruno Faccini (Inactive) added a comment - 21/Apr/14 8:42 PM Sure Bob, this last thread should be the one blocking all the others in JBD2 layer! And again, and like in one of my original updates on 19/Nov/2013, the concerned device is "dm-3-8" that Mahmoud confirmed to be the MDT. And yes, it looks like a dup of LU-4794 , but also earlier tickets that have been simply closed due to no new occurrence ... What would be cool now would be to identify if this last thread has been scheduled recently, if not why or if yes, why it is looping+re-schedule()ing (t_updates != NULL?) ...

Bob Glossman (Inactive) added a comment - 21/Apr/14 7:58 PM

another odd outlier thread:

PID: 6425   TASK: ffff883fdc6c9500  CPU: 5   COMMAND: "jbd2/dm-3-8"
 #0 [ffff883fec013c60] schedule at ffffffff8151c712
 #1 [ffff883fec013d28] jbd2_journal_commit_transaction at ffffffffa055d8df [jbd2]
 #2 [ffff883fec013e68] kjournald2 at ffffffffa05646c8 [jbd2]
 #3 [ffff883fec013ee8] kthread at ffffffff8108fb96
 #4 [ffff883fec013f48] kernel_thread at ffffffff8100c0ca

I'm wondering if this might be a dup of ~~LU-4794~~

Bob Glossman (Inactive) added a comment - 21/Apr/14 7:58 PM another odd outlier thread: PID: 6425 TASK: ffff883fdc6c9500 CPU: 5 COMMAND: "jbd2/dm-3-8" #0 [ffff883fec013c60] schedule at ffffffff8151c712 #1 [ffff883fec013d28] jbd2_journal_commit_transaction at ffffffffa055d8df [jbd2] #2 [ffff883fec013e68] kjournald2 at ffffffffa05646c8 [jbd2] #3 [ffff883fec013ee8] kthread at ffffffff8108fb96 #4 [ffff883fec013f48] kernel_thread at ffffffff8100c0ca I'm wondering if this might be a dup of LU-4794

Bob Glossman (Inactive) added a comment - 21/Apr/14 7:29 PM

I note that in addition to the 500+ threads stuck in jbd2_journal_start, there's one thread stuck elsewhere it jbd2 code. The stack trace of this thread looks like:

PID: 7885   TASK: ffff881e506e2ae0  CPU: 6   COMMAND: "mdt_439"
 #0 [ffff881e506f38c0] schedule at ffffffff8151c712
 #1 [ffff881e506f3988] jbd2_log_wait_commit at ffffffffa0564325 [jbd2]
 #2 [ffff881e506f3a18] jbd2_journal_stop at ffffffffa055bacb [jbd2]
 #3 [ffff881e506f3a78] __ldiskfs_journal_stop at ffffffffa08e62a8 [ldiskfs]
 #4 [ffff881e506f3aa8] osd_trans_stop at ffffffffa0d26476 [osd_ldiskfs]
 #5 [ffff881e506f3ad8] mdd_trans_stop at ffffffffa0c2d4aa [mdd]
 #6 [ffff881e506f3ae8] mdd_attr_set at ffffffffa0c0ca5f [mdd]
 #7 [ffff881e506f3bc8] cml_attr_set at ffffffffa0d4fa76 [cmm]
 #8 [ffff881e506f3bf8] mdt_attr_set at ffffffffa0c9ec68 [mdt]
 #9 [ffff881e506f3c48] mdt_reint_setattr at ffffffffa0c9f2b5 [mdt]
#10 [ffff881e506f3cd8] mdt_reint_rec at ffffffffa0c98c81 [mdt]
#11 [ffff881e506f3cf8] mdt_reint_internal at ffffffffa0c8fed4 [mdt]
#12 [ffff881e506f3d48] mdt_reint at ffffffffa0c902b4 [mdt]
#13 [ffff881e506f3d68] mdt_handle_common at ffffffffa0c84772 [mdt]
#14 [ffff881e506f3db8] mdt_regular_handle at ffffffffa0c85665 [mdt]
#15 [ffff881e506f3dc8] ptlrpc_main at ffffffffa079fb4e [ptlrpc]
#16 [ffff881e506f3f48] kernel_thread at ffffffff8100c0ca

I can't see how this thread might be blocking all the others, but it is interesting that it's the only one I have found so far in similar code but different from all the others.

Bob Glossman (Inactive) added a comment - 21/Apr/14 7:29 PM I note that in addition to the 500+ threads stuck in jbd2_journal_start, there's one thread stuck elsewhere it jbd2 code. The stack trace of this thread looks like: PID: 7885 TASK: ffff881e506e2ae0 CPU: 6 COMMAND: "mdt_439" #0 [ffff881e506f38c0] schedule at ffffffff8151c712 #1 [ffff881e506f3988] jbd2_log_wait_commit at ffffffffa0564325 [jbd2] #2 [ffff881e506f3a18] jbd2_journal_stop at ffffffffa055bacb [jbd2] #3 [ffff881e506f3a78] __ldiskfs_journal_stop at ffffffffa08e62a8 [ldiskfs] #4 [ffff881e506f3aa8] osd_trans_stop at ffffffffa0d26476 [osd_ldiskfs] #5 [ffff881e506f3ad8] mdd_trans_stop at ffffffffa0c2d4aa [mdd] #6 [ffff881e506f3ae8] mdd_attr_set at ffffffffa0c0ca5f [mdd] #7 [ffff881e506f3bc8] cml_attr_set at ffffffffa0d4fa76 [cmm] #8 [ffff881e506f3bf8] mdt_attr_set at ffffffffa0c9ec68 [mdt] #9 [ffff881e506f3c48] mdt_reint_setattr at ffffffffa0c9f2b5 [mdt] #10 [ffff881e506f3cd8] mdt_reint_rec at ffffffffa0c98c81 [mdt] #11 [ffff881e506f3cf8] mdt_reint_internal at ffffffffa0c8fed4 [mdt] #12 [ffff881e506f3d48] mdt_reint at ffffffffa0c902b4 [mdt] #13 [ffff881e506f3d68] mdt_handle_common at ffffffffa0c84772 [mdt] #14 [ffff881e506f3db8] mdt_regular_handle at ffffffffa0c85665 [mdt] #15 [ffff881e506f3dc8] ptlrpc_main at ffffffffa079fb4e [ptlrpc] #16 [ffff881e506f3f48] kernel_thread at ffffffff8100c0ca I can't see how this thread might be blocking all the others, but it is interesting that it's the only one I have found so far in similar code but different from all the others.

Bob Glossman (Inactive) added a comment - 21/Apr/14 7:17 PM

Have been looking over the crash dump from the customer. As noted in previous comments there is quite a large number of threads all stuck in jbd2_journal_start with the common calling sequence ... osd_trans_start -> ldiskfs_journal_start_sb -> jbd2_journal_start -> start_this_handle. It seems very likely there's a single other thread holding a lock or otherwise blocking all these callers of jbd2_journal_start, but so far I haven't been able to find the culprit.

Bob Glossman (Inactive) added a comment - 21/Apr/14 7:17 PM Have been looking over the crash dump from the customer. As noted in previous comments there is quite a large number of threads all stuck in jbd2_journal_start with the common calling sequence ... osd_trans_start -> ldiskfs_journal_start_sb -> jbd2_journal_start -> start_this_handle. It seems very likely there's a single other thread holding a lock or otherwise blocking all these callers of jbd2_journal_start, but so far I haven't been able to find the culprit.

Cliff White (Inactive) added a comment - 16/Apr/14 8:42 PM

Mahmoud, you including a large number of kernel RPMs in the tarball, can you please specify exactly which version corresponds to the kernel running on the node where the vmcore was taken?

Cliff White (Inactive) added a comment - 16/Apr/14 8:42 PM Mahmoud, you including a large number of kernel RPMs in the tarball, can you please specify exactly which version corresponds to the kernel running on the node where the vmcore was taken?

Mahmoud Hanafi added a comment - 15/Apr/14 6:19 PM

Cliff,

Email sent

-Mahmoud

Mahmoud Hanafi added a comment - 15/Apr/14 6:19 PM Cliff, Email sent -Mahmoud

Cliff White (Inactive) added a comment - 15/Apr/14 3:11 PM

Mahmoud, apparently I missed your email, very sorry, can you re-send the information on the dump?

Cliff White (Inactive) added a comment - 15/Apr/14 3:11 PM Mahmoud, apparently I missed your email, very sorry, can you re-send the information on the dump?

Bruno Faccini (Inactive) added a comment - 15/Apr/14 10:09 AM

Cliff, I wonder if you found the same situation in the crash-dump than the one I described from the all threads stacks log ??

Bruno Faccini (Inactive) added a comment - 15/Apr/14 10:09 AM Cliff, I wonder if you found the same situation in the crash-dump than the one I described from the all threads stacks log ??

Mahmoud Hanafi added a comment - 31/Mar/14 6:59 PM

crash dump was uploaded and email was sent to cliff white.

Mahmoud Hanafi added a comment - 31/Mar/14 6:59 PM crash dump was uploaded and email was sent to cliff white.

Mahmoud Hanafi added a comment - 27/Mar/14 8:39 PM

We did take a crash dump. I will encrypt and uploaded. It will have to be accessed by US citizens only.

Mahmoud Hanafi added a comment - 27/Mar/14 8:39 PM We did take a crash dump. I will encrypt and uploaded. It will have to be accessed by US citizens only.

Bruno Faccini (Inactive) added a comment - 27/Mar/14 10:05 AM

Did you also successfully take a crash-dump of your MDS, as I requested in my update/comment dated "25/Nov/13 4:47 PM" for this ticket, and as seem to indicate the end of the dmesg you uploaded for this latest occurrence ?? If yes, can you upload it with associated vmlinux (or kernel-debuginfo* RPMs) and Lustre modules.

BTW, at the time you dumped all threads stacks with Alt+SysRq+T key-stroke, there are 502 threads stuck in jbd2_journal_start()/start_this_handle(), 129 threads stuck in dqacq_handler()/... because one of them is itself also stuck in the journal/jbd2 layer via lustre_commit_dquot(), so the load should be greater than 630 at that time ... But I am unable to find who is responsible to block all of these threads with the only full stack listing.

Bruno Faccini (Inactive) added a comment - 27/Mar/14 10:05 AM Did you also successfully take a crash-dump of your MDS, as I requested in my update/comment dated "25/Nov/13 4:47 PM" for this ticket, and as seem to indicate the end of the dmesg you uploaded for this latest occurrence ?? If yes, can you upload it with associated vmlinux (or kernel-debuginfo* RPMs) and Lustre modules. BTW, at the time you dumped all threads stacks with Alt+SysRq+T key-stroke, there are 502 threads stuck in jbd2_journal_start()/start_this_handle(), 129 threads stuck in dqacq_handler()/... because one of them is itself also stuck in the journal/jbd2 layer via lustre_commit_dquot(), so the load should be greater than 630 at that time ... But I am unable to find who is responsible to block all of these threads with the only full stack listing.

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Mahmoud Hanafi

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 19/Nov/13 8:41 AM

Updated:: 14/Aug/14 9:18 PM

Resolved:: 14/Aug/14 9:18 PM