[LU-4271] mds load goes very high and filesystem hangs after mounting mdt - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Won't Fix
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.1.5
Labels:
None

Severity:
4
Rank (Obsolete):
11731

Description

after recover of a crashed mds the system load goes to >800
Filesystem is DOWN. We need help to bring the filesystem online!

here is the error
Lustre: Skipped 2 previous similar messages
Lustre: Service thread pid 7014 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 7014, comm: mdt_01

Call Trace:
[<ffffffff8151d552>] schedule_timeout+0x192/0x2e0
[<ffffffff8107bf80>] ? process_timeout+0x0/0x10
[<ffffffffa04e45e1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
[<ffffffffa0da2508>] osc_create+0x528/0xdc0 [osc]
[<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
[<ffffffffa0e13337>] lov_check_and_create_object+0x187/0x570 [lov]
[<ffffffffa0e13a1b>] qos_remedy_create+0x1db/0x220 [lov]
[<ffffffffa0e1059a>] lov_fini_create_set+0x24a/0x1200 [lov]
[<ffffffffa0dfa0f2>] lov_create+0x792/0x1400 [lov]
[<ffffffffa11000d6>] ? mdd_get_md+0x96/0x2f0 [mdd]
[<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
[<ffffffffa1120916>] ? mdd_read_unlock+0x26/0x30 [mdd]
[<ffffffffa110490e>] mdd_lov_create+0x9ee/0x1ba0 [mdd]
[<ffffffffa1116871>] mdd_create+0xf81/0x1a90 [mdd]
[<ffffffffa121edf3>] ? osd_oi_lookup+0x83/0x110 [osd_ldiskfs]
[<ffffffffa121956c>] ? osd_object_init+0xdc/0x3e0 [osd_ldiskfs]
[<ffffffffa124f3f7>] cml_create+0x97/0x250 [cmm]
[<ffffffffa118b5e1>] ? mdt_version_get_save+0x91/0xd0 [mdt]
[<ffffffffa11a106e>] mdt_reint_open+0x1aae/0x28a0 [mdt]
[<ffffffffa077a724>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
[<ffffffffa111956e>] ? md_ucred+0x1e/0x60 [mdd]
[<ffffffffa1189c81>] mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa1180ed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
[<ffffffffa118153d>] mdt_intent_reint+0x1ed/0x530 [mdt]
[<ffffffffa117fc09>] mdt_intent_policy+0x379/0x690 [mdt]
[<ffffffffa0736351>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
[<ffffffffa075c1ad>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
[<ffffffffa1180586>] mdt_enqueue+0x46/0x130 [mdt]
[<ffffffffa1175772>] mdt_handle_common+0x932/0x1750 [mdt]
[<ffffffffa1176665>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa078ab4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffffa0789f00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

nbp8-mds1-dmesg.mar25.gz
223 kB
26/Mar/14 1:50 AM
service200-20131119
1.52 MB
19/Nov/13 9:07 AM

Issue Links

is related to

LU-4335 MDS hangs due to mdt thread hung/inactive

Resolved

Activity

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Mahmoud Hanafi

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 19/Nov/13 8:41 AM

Updated:: 14/Aug/14 9:18 PM

Resolved:: 14/Aug/14 9:18 PM