Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Lustre 2.1.5
-
None
-
server running 2.1.5-2nas
our source is at git://github.com/jlan/lustre-nas.git
Description
mdt response very slow. Top showed ptlrpcd running at 100% cpu. Console showed errors. Was able to run debug trace. See attached files.
Lustre: Service thread pid 7065 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
Pid: 7065, comm: mdt_93
Call Trace:
[<ffffffff810539b2>] ? enqueue_task_fair+0x142/0x490
[<ffffffff81096f5d>] ? sched_clock_cpu+0xcd/0x110
[<ffffffffa04f960e>] cfs_waitq_wait+0xe/0x10 [libcfs]
[<ffffffffa0a9f6de>] qos_statfs_update+0x7fe/0xa70 [lov]
[<ffffffff8105fab0>] ? default_wake_function+0x0/0x20
[<ffffffffa0aa00fd>] alloc_qos+0x1ad/0x21a0 [lov]
[<ffffffffa0aa306c>] qos_prep_create+0x1ec/0x2380 [lov]
[<ffffffffa04f98be>] ? cfs_free+0xe/0x10 [libcfs]
[<ffffffffa0a9c63a>] lov_prep_create_set+0xea/0x390 [lov]
[<ffffffffa0a84b0c>] lov_create+0x1ac/0x1400 [lov]
[<ffffffffa0d8a50f>] ? obd_iocontrol+0xef/0x390 [mdd]
[<ffffffffa0d8f90e>] mdd_lov_create+0x9ee/0x1ba0 [mdd]
[<ffffffffa0da1871>] mdd_create+0xf81/0x1a90 [mdd]
[<ffffffffa0ea9df3>] ? osd_oi_lookup+0x83/0x110 [osd_ldiskfs]
[<ffffffffa0ea456c>] ? osd_object_init+0xdc/0x3e0 [osd_ldiskfs]
[<ffffffffa0eda3f7>] cml_create+0x97/0x250 [cmm]
[<ffffffffa0e165e1>] ? mdt_version_get_save+0x91/0xd0 [mdt]
[<ffffffffa0e2c06e>] mdt_reint_open+0x1aae/0x28a0 [mdt]
[<ffffffffa078f724>] ? lustre_msg_add_version+0x74/0xd0 [ptlrpc]
[<ffffffffa0da456e>] ? md_ucred+0x1e/0x60 [mdd]
[<ffffffffa0e14c81>] mdt_reint_rec+0x41/0xe0 [mdt]
[<ffffffffa0e0bed4>] mdt_reint_internal+0x544/0x8e0 [mdt]
[<ffffffffa0e0c53d>] mdt_intent_reint+0x1ed/0x530 [mdt]
[<ffffffffa0e0ac09>] mdt_intent_policy+0x379/0x690 [mdt]
[<ffffffffa074b351>] ldlm_lock_enqueue+0x361/0x8f0 [ptlrpc]
[<ffffffffa07711ad>] ldlm_handle_enqueue0+0x48d/0xf50 [ptlrpc]
[<ffffffffa0e0b586>] mdt_enqueue+0x46/0x130 [mdt]
[<ffffffffa0e00772>] mdt_handle_common+0x932/0x1750 [mdt]
[<ffffffffa0e01665>] mdt_regular_handle+0x15/0x20 [mdt]
[<ffffffffa079fb4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc]
[<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffff8100c0ca>] child_rip+0xa/0x20
[<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc]
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20
Hello again Mahmoud,
Please advise if you want us to keep this ticket open?
If we don't hear back we'll mark it as resolved, with no fix, and we can re-open it if requested to do that.
Thanks,
~ jfc.