Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
Lustre 2.4.1
-
None
-
lustre: 2.1.5
kernel: 2.6.32-279.19.1.el6.20130516.x86_64.lustre215
build: 2nasS_ofed154
SRC at https://github.com/jlan/lustre-nas
-
3
-
13955
Description
MDS build up high load with no cpu activity. Lustre dumping call trace to console. (looks like dup of LU-4794. If so please advise when the patch will land)
Attached is full stack trace for all threads.
INFO: task ldlm_cn_00:6299 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ldlm_cn_00 D 000000000000001a 0 6299 2 0x00000080 ffff881ec525db30 0000000000000046 0000000000000000 ffffffff8129507e ffff881ec525dad0 00000000dcd2dc2e ffff881fb0bd8d00 ffff881ec525dad0 ffff881fafe73098 ffff881ec525dfd8 000000000000fc40 ffff881fafe73098 Call Trace: [<ffffffff8129507e>] ? number+0x2ee/0x320 [<ffffffffa055c14a>] start_this_handle+0x27a/0x4a0 [jbd2] [<ffffffff8108ff00>] ? autoremove_wake_function+0x0/0x40 [<ffffffffa055c570>] jbd2_journal_start+0xd0/0x110 [jbd2] [<ffffffffa08e6338>] ldiskfs_journal_start_sb+0x58/0x90 [ldiskfs] [<ffffffffa072c017>] fsfilt_ldiskfs_start+0x77/0x5e0 [fsfilt_ldiskfs] [<ffffffffa07a9ac0>] llog_origin_handle_cancel+0x4b0/0xd70 [ptlrpc] [<ffffffffa076f71f>] ldlm_cancel_handler+0x1bf/0x5e0 [ptlrpc] [<ffffffffa079fb4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc] [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 INFO: task ldlm_cb_00:6302 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ldlm_cb_00 D 0000000000000002 0 6302 2 0x00000080 ffff881ec5265b20 0000000000000046 0000000000000000 000000ab00000000 ffff881ec5265b50 ffffffff8129507e 3634333236363330 3134363536363336 ffff881ec5263af8 ffff881ec5265fd8 000000000000fc40 ffff881ec5263af8 Call Trace: [<ffffffff8129507e>] ? number+0x2ee/0x320 [<ffffffff8151ecc5>] rwsem_down_failed_common+0x95/0x1d0 [<ffffffff8151ee23>] rwsem_down_write_failed+0x23/0x30 [<ffffffff812992f3>] call_rwsem_down_write_failed+0x13/0x20 [<ffffffff8151e322>] ? down_write+0x32/0x40 [<ffffffffa09d543e>] dqacq_handler+0x35e/0xd20 [lquota] [<ffffffffa07b8486>] ? __req_capsule_get+0x176/0x750 [ptlrpc] [<ffffffffa07921e0>] ? lustre_swab_qdata+0x0/0x30 [ptlrpc] [<ffffffffa075e1d8>] target_handle_dqacq_callback+0x668/0xb90 [ptlrpc] [<ffffffffa09d50e0>] ? dqacq_handler+0x0/0xd20 [lquota] [<ffffffffa076df87>] ldlm_callback_handler+0xa17/0x1ff0 [ptlrpc] [<ffffffffa0503ea1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffffa04ff4a4>] ? libcfs_id2str+0x74/0xb0 [libcfs] [<ffffffffa079fb4e>] ptlrpc_main+0xc4e/0x1a40 [ptlrpc] [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] [<ffffffffa079ef00>] ? ptlrpc_main+0x0/0x1a40 [ptlrpc] [<ffffffff8100c0c0>] ? child_rip+0x0/0x20 INFO: task ldlm_cb_01:6303 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ldlm_cb_01 D 000000000000000d 0 6303 2 0x00000080 ffff881ec5267b20 0000000000000046 0000000000000000 000000ab00000000 ffff881ec5267b50 ffffffff8129507e ffff881ec5267ad0 000000005c2ae174 ffff881ec5263098 ffff881ec5267fd8 000000000000fc40 ffff881ec5263098
Attachments
Issue Links
- duplicates
-
LU-4794 MDS threads all stuck in jbd2_journal_start
- Resolved