Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.7.0
-
None
-
OpenSFS cluster with two MDSs with one MDT each, three OSSs with two OSTs each and three clients running lustre-master build #2771
-
3
-
16842
Description
Started racer on client c13 and soon after, client c11 crashed. racer paused for a long time, but then resumed running and looks like it is hung again.
On client c11 console, I saw:
Message from syslogd@c11 at Dec 17 09:45:47 ... kernel:LustreError: 26538:0:(osc_object.c:212:osc_object_ast_clear()) ASSERTION( lock->l_granted_mode == lock->l_req_mode ) failed: Message from syslogd@c11 at Dec 17 09:45:47 ... kernel:LustreError: 26538:0:(osc_object.c:212:osc_object_ast_clear()) LBUG
This could be related to one of the other racer tickets, but I couldn’t find any of the other racer tickets mention an LBUG in osc_object_ast_clear().
From the crash dmesg on client c11, there are many “fid is insane” errors and a call trace:
… <3>LustreError: 24520:0:(file.c:3036:ll_migrate()) scratch: migrate 3 , but fid [0x0:0x0:0x0] is insane <3>LustreError: 24520:0:(file.c:3036:ll_migrate()) Skipped 33 previous similar m essages <3>LustreError: 29819:0:(file.c:3036:ll_migrate()) scratch: migrate 7 , but fid [0x0:0x0:0x0] is insane <3>LustreError: 29819:0:(file.c:3036:ll_migrate()) Skipped 7 previous similar me ssages <0>LustreError: 26538:0:(osc_object.c:212:osc_object_ast_clear()) ASSERTION( loc k->l_granted_mode == lock->l_req_mode ) failed: <0>LustreError: 26538:0:(osc_object.c:212:osc_object_ast_clear()) LBUG <4>Pid: 26538, comm: lfs <4> <4>Call Trace: <4> [<ffffffffa0fb6895>] libcfs_debug_dumpstack+0x55/0x80 [libcfs] <4> [<ffffffffa0fb6e97>] lbug_with_loc+0x47/0xb0 [libcfs] <4> [<ffffffffa16f20ca>] osc_object_ast_clear+0x12a/0x130 [osc] <4> [<ffffffffa16f1fa0>] ? osc_object_ast_clear+0x0/0x130 [osc] <4> [<ffffffffa12bec8f>] ldlm_resource_foreach+0x29f/0x300 [ptlrpc] <4> [<ffffffffa16f1fa0>] ? osc_object_ast_clear+0x0/0x130 [osc] <4> [<ffffffffa12bed6a>] ldlm_resource_iterate+0x7a/0x1a0 [ptlrpc] <4> [<ffffffffa16f1e66>] osc_object_prune+0xd6/0x210 [osc] <4> [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 <4> [<ffffffffa1118ee5>] cl_object_prune+0x55/0x100 [obdclass] <4> [<ffffffffa155b32c>] lov_delete_raid0+0xcc/0x3e0 [lov] <4> [<ffffffff8128ceb6>] ? vsnprintf+0x336/0x5e0 <4> [<ffffffffa155a819>] lov_object_delete+0x69/0x180 [lov] <4> [<ffffffffa1110141>] lu_object_free+0x81/0x1a0 [obdclass] <4> [<ffffffffa0fcbdb4>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] <4> [<ffffffffa0fcc4a2>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs] <4> [<ffffffffa11108bd>] lu_object_put+0xad/0x330 [obdclass] <4> [<ffffffffa1615da2>] ? cl_inode_fini+0x52/0x270 [lustre] <4> [<ffffffffa11195be>] cl_object_put+0xe/0x10 [obdclass] <4> [<ffffffffa1615dda>] cl_inode_fini+0x8a/0x270 [lustre] <4> [<ffffffffa151545e>] ? mdc_null_inode+0x7e/0x1c0 [mdc] <4> [<ffffffffa15d94fd>] ll_clear_inode+0x25d/0x980 [lustre] <4> [<ffffffffa15d7ee0>] ? ll_delete_inode+0x0/0x210 [lustre] <4> [<ffffffff811a654c>] clear_inode+0xac/0x140 <4> [<ffffffffa15d7f44>] ll_delete_inode+0x64/0x210 [lustre] <4> [<ffffffff811a6c4e>] generic_delete_inode+0xde/0x1d0 <4> [<ffffffff811a6da5>] generic_drop_inode+0x65/0x80 <4> [<ffffffff811a5bf2>] iput+0x62/0x70 <4> [<ffffffffa15c27b7>] ll_migrate+0x437/0x950 [lustre] <4> [<ffffffffa15ba35e>] ll_dir_ioctl+0x5a6e/0x64d0 [lustre] <4> [<ffffffff8119f78d>] ? filldir+0x7d/0xe0 <4> [<ffffffffa15f9d40>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] <4> [<ffffffffa15b0bc5>] ? ll_release_page+0x35/0xd0 [lustre] <4> [<ffffffffa15b0e9f>] ? ll_dir_read+0x23f/0x300 [lustre] <4> [<ffffffff8119f710>] ? filldir+0x0/0xe0 <4> [<ffffffff8119e4e2>] vfs_ioctl+0x22/0xa0 <4> [<ffffffff8119e684>] do_vfs_ioctl+0x84/0x580 <4> [<ffffffff8119f710>] ? filldir+0x0/0xe0 <4> [<ffffffff8119f972>] ? vfs_readdir+0xa2/0xe0 <4> [<ffffffff8119ec01>] sys_ioctl+0x81/0xa0 <4> [<ffffffff8152c07e>] ? do_device_not_available+0xe/0x10 <4> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b <4> <0>Kernel panic - not syncing: LBUG <4>Pid: 26538, comm: lfs Not tainted 2.6.32-431.29.2.el6.x86_64 #1 <4>Call Trace: <4> [<ffffffff8152873c>] ? panic+0xa7/0x16f <4> [<ffffffffa0fb6eeb>] ? lbug_with_loc+0x9b/0xb0 [libcfs] <4> [<ffffffffa16f20ca>] ? osc_object_ast_clear+0x12a/0x130 [osc] <4> [<ffffffffa16f1fa0>] ? osc_object_ast_clear+0x0/0x130 [osc] <4> [<ffffffffa12bec8f>] ? ldlm_resource_foreach+0x29f/0x300 [ptlrpc] <4> [<ffffffffa16f1fa0>] ? osc_object_ast_clear+0x0/0x130 [osc] <4> [<ffffffffa12bed6a>] ? ldlm_resource_iterate+0x7a/0x1a0 [ptlrpc] <4> [<ffffffffa16f1e66>] ? osc_object_prune+0xd6/0x210 [osc] <4> [<ffffffff81058bd3>] ? __wake_up+0x53/0x70 <4> [<ffffffffa1118ee5>] ? cl_object_prune+0x55/0x100 [obdclass] <4> [<ffffffffa155b32c>] ? lov_delete_raid0+0xcc/0x3e0 [lov] <4> [<ffffffff8128ceb6>] ? vsnprintf+0x336/0x5e0 <4> [<ffffffffa155a819>] ? lov_object_delete+0x69/0x180 [lov] <4> [<ffffffffa1110141>] ? lu_object_free+0x81/0x1a0 [obdclass] <4> [<ffffffffa0fcbdb4>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] <3>LustreError: 27936:0:(file.c:3036:ll_migrate()) scratch: migrate sleep , but fid [0x0:0x0:0x0] is insane <3>LustreError: 27936:0:(file.c:3036:ll_migrate()) Skipped 122 previous similar messages <4> [<ffffffffa0fcc4a2>] ? cfs_hash_bd_from_key+0x42/0xd0 [libcfs] <4> [<ffffffffa11108bd>] ? lu_object_put+0xad/0x330 [obdclass] <4> [<ffffffffa1615da2>] ? cl_inode_fini+0x52/0x270 [lustre] <4> [<ffffffffa11195be>] ? cl_object_put+0xe/0x10 [obdclass] <4> [<ffffffffa1615dda>] ? cl_inode_fini+0x8a/0x270 [lustre] <4> [<ffffffffa151545e>] ? mdc_null_inode+0x7e/0x1c0 [mdc] <4> [<ffffffffa15d94fd>] ? ll_clear_inode+0x25d/0x980 [lustre] <4> [<ffffffffa15d7ee0>] ? ll_delete_inode+0x0/0x210 [lustre] <4> [<ffffffff811a654c>] ? clear_inode+0xac/0x140 <4> [<ffffffffa15d7f44>] ? ll_delete_inode+0x64/0x210 [lustre] <4> [<ffffffff811a6c4e>] ? generic_delete_inode+0xde/0x1d0 <4> [<ffffffff811a6da5>] ? generic_drop_inode+0x65/0x80 <4> [<ffffffff811a5bf2>] ? iput+0x62/0x70 <4> [<ffffffffa15c27b7>] ? ll_migrate+0x437/0x950 [lustre] <4> [<ffffffffa15ba35e>] ? ll_dir_ioctl+0x5a6e/0x64d0 [lustre] <4> [<ffffffff8119f78d>] ? filldir+0x7d/0xe0 <4> [<ffffffffa15f9d40>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] <4> [<ffffffffa15b0bc5>] ? ll_release_page+0x35/0xd0 [lustre] <4> [<ffffffffa15b0e9f>] ? ll_dir_read+0x23f/0x300 [lustre] <4> [<ffffffff8119f710>] ? filldir+0x0/0xe0 <4> [<ffffffff8119e4e2>] ? vfs_ioctl+0x22/0xa0 <4> [<ffffffff8119e684>] ? do_vfs_ioctl+0x84/0x580 <4> [<ffffffff8119f710>] ? filldir+0x0/0xe0 <4> [<ffffffff8119f972>] ? vfs_readdir+0xa2/0xe0 <4> [<ffffffff8119ec01>] ? sys_ioctl+0x81/0xa0 <4> [<ffffffff8152c07e>] ? do_device_not_available+0xe/0x10 <4> [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
On c13, the client running racer, I see the following in dmesg:
LustreError: 30632:0:(file.c:3036:ll_migrate()) scratch: migrate 11 , but fid [0x0:0x0:0x0] is insane LustreError: 30632:0:(file.c:3036:ll_migrate()) Skipped 6 previous similar messages INFO: task dir_create.sh:11791 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. dir_create.sh D 0000000000000004 0 11791 11766 0x00000080 ffff8803dfd87818 0000000000000086 ffff8803dfd877f8 ffffffffa08d3a13 0000000000000000 0000000000000000 ffffffffa093db60 ffff8803b26a1c00 ffff8808096f9af8 ffff8803dfd87fd8 000000000000fbc8 ffff8808096f9af8 Call Trace: [<ffffffffa08d3a13>] ? __req_capsule_get+0x163/0x6d0 [ptlrpc] [<ffffffff8128abba>] ? strlcpy+0x4a/0x60 [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffffa0ba85d5>] ? mdc_open_pack+0x1d5/0x250 [mdc] [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffffa0babe72>] mdc_enqueue+0x222/0x1a30 [mdc] [<ffffffffa0bad862>] mdc_intent_lock+0x1e2/0x5f9 [mdc] [<ffffffffa12b6d40>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa0889f40>] ? ldlm_completion_ast+0x0/0x9a0 [ptlrpc] [<ffffffffa0b58f1a>] ? lmv_fid_alloc+0x25a/0x3d0 [lmv] [<ffffffffa0b73aab>] lmv_intent_open+0x31b/0x9f0 [lmv] [<ffffffffa12b6d40>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa0b7445f>] lmv_intent_lock+0x2df/0x11c0 [lmv] [<ffffffff8116f503>] ? kmem_cache_alloc_trace+0x1a3/0x1b0 [<ffffffffa12b4129>] ? ll_i2suppgid+0x19/0x30 [lustre] [<ffffffffa12b416e>] ? ll_i2gids+0x2e/0xd0 [lustre] [<ffffffffa1299a9c>] ? ll_prep_md_op_data+0x22c/0x530 [lustre] [<ffffffffa12b6d40>] ? ll_md_blocking_ast+0x0/0x7f0 [lustre] [<ffffffffa12b8929>] ll_lookup_it+0x249/0x9a0 [lustre] [<ffffffffa12b9109>] ll_lookup_nd+0x89/0x5e0 [lustre] [<ffffffff81196492>] __lookup_hash+0x102/0x160 [<ffffffff81196bba>] lookup_hash+0x3a/0x50 [<ffffffff8119ba7e>] do_filp_open+0x2de/0xd20 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffff81016c71>] ? fpu_finit+0x21/0x40 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task dir_create.sh:11901 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. dir_create.sh D 0000000000000005 0 11901 11766 0x00000080 ffff880819e69b98 0000000000000086 0000004b00000000 ffffffffa12e7983 0000000000000098 0020000000000080 5491c0bf00000005 00000000000b7709 ffff8804b7e5d098 ffff880819e69fd8 000000000000fbc8 ffff8804b7e5d098 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff811a4148>] ? __d_lookup+0xd8/0x150 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff81199100>] __link_path_walk+0x200/0x1000 [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119b99a>] do_filp_open+0x1fa/0xd20 [<ffffffff8109b39c>] ? remove_wait_queue+0x3c/0x50 [<ffffffff81016c71>] ? fpu_finit+0x21/0x40 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task mv:23695 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mv D 0000000000000006 0 23695 11847 0x00000080 ffff88080f1a7cd8 0000000000000086 0000000000000000 ffff8807b7706aa0 ffff8807b7706aa0 ffff8807b7706aa0 ffff8807b7706aa0 ffff8807b7706aa0 ffff8807b7707058 ffff88080f1a7fd8 000000000000fbc8 ffff8807b7707058 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff81197501>] ? path_put+0x31/0x40 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811969af>] lock_rename+0x3f/0xe0 [<ffffffff8119a701>] sys_renameat+0x1b1/0x3a0 [<ffffffff8119b502>] ? user_path_at+0x62/0xa0 [<ffffffff8118e754>] ? cp_new_stat+0xe4/0x100 [<ffffffff8118ea86>] ? sys_newlstat+0x36/0x50 [<ffffffff810e1e07>] ? audit_syscall_entry+0x1d7/0x200 [<ffffffff8119a90b>] sys_rename+0x1b/0x20 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task ls:26715 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D 0000000000000000 0 26715 11886 0x00000080 ffff8803c17ebb58 0000000000000082 0000004b00000000 ffffffffa12e7983 0000000000000098 0020000000000080 5491c0bf00000000 00000000000bc66a ffff88047a4ed058 ffff8803c17ebfd8 000000000000fbc8 ffff88047a4ed058 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff811a4148>] ? __d_lookup+0xd8/0x150 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff811996a4>] __link_path_walk+0x7a4/0x1000 [<ffffffffa1269000>] ? return_if_equal+0x0/0x30 [lustre] [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff81226d56>] ? security_file_alloc+0x16/0x20 [<ffffffff8119b8a4>] do_filp_open+0x104/0xd20 [<ffffffff810ec53e>] ? call_rcu+0xe/0x10 [<ffffffff811a28ef>] ? d_free+0x3f/0x60 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff8100c715>] ? math_state_restore+0x45/0x60 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task ls:26717 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D 0000000000000000 0 26717 11886 0x00000080 ffff8803b26a5b58 0000000000000086 0000004b00000000 ffffffffa12e7983 0000000000000098 0020000000000080 5491c0bf00000000 00000000000acca1 ffff88040b0bd098 ffff8803b26a5fd8 000000000000fbc8 ffff88040b0bd098 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff811a4148>] ? __d_lookup+0xd8/0x150 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff81199100>] __link_path_walk+0x200/0x1000 [<ffffffffa1269000>] ? return_if_equal+0x0/0x30 [lustre] [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff81226d56>] ? security_file_alloc+0x16/0x20 [<ffffffff8119b8a4>] do_filp_open+0x104/0xd20 [<ffffffff810ec53e>] ? call_rcu+0xe/0x10 [<ffffffff811a28ef>] ? d_free+0x3f/0x60 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 LustreError: 14431:0:(file.c:3036:ll_migrate()) scratch: migrate 1 , but fid [0x0:0x0:0x0] is insane LustreError: 14431:0:(file.c:3036:ll_migrate()) Skipped 48 previous similar messages [<ffffffff8100c715>] ? math_state_restore+0x45/0x60 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task ls:26719 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D 0000000000000001 0 26719 11886 0x00000080 ffff880813fcdb58 0000000000000086 0000000000000000 ffffffffa12e7983 0000000000000098 0020000000000080 5491c0bf00000001 00000000000afd04 ffff880810bce638 ffff880813fcdfd8 000000000000fbc8 ffff880810bce638 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffffa04951a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff811996a4>] __link_path_walk+0x7a4/0x1000 [<ffffffffa1269000>] ? return_if_equal+0x0/0x30 [lustre] [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff81226d56>] ? security_file_alloc+0x16/0x20 [<ffffffff8119b8a4>] do_filp_open+0x104/0xd20 [<ffffffff810ec53e>] ? call_rcu+0xe/0x10 [<ffffffff811a28ef>] ? d_free+0x3f/0x60 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff8100c715>] ? math_state_restore+0x45/0x60 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task ls:26720 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D 0000000000000000 0 26720 11886 0x00000080 ffff880813d71b58 0000000000000086 0000004b13d71ac8 ffffffffa12e7983 0000000000000098 0020000000000080 5491c0bf00000000 00000000000ad882 ffff8806113d9ab8 ffff880813d71fd8 000000000000fbc8 ffff8806113d9ab8 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff811a4148>] ? __d_lookup+0xd8/0x150 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff81199100>] __link_path_walk+0x200/0x1000 [<ffffffffa0899caf>] ? ptlrpc_request_cache_free+0xbf/0x100 [ptlrpc] [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff81226d56>] ? security_file_alloc+0x16/0x20 [<ffffffff8119b8a4>] do_filp_open+0x104/0xd20 [<ffffffffa128343c>] ? ll_file_release+0x2fc/0xb40 [lustre] [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff8100c715>] ? math_state_restore+0x45/0x60 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task ls:26721 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D 0000000000000007 0 26721 11886 0x00000080 ffff8808105d5b58 0000000000000082 0000004b00000000 ffffffffa12e7983 0000000000000098 0020000000000080 5491c0bf00000007 00000000000b048e ffff880521c065f8 ffff8808105d5fd8 000000000000fbc8 ffff880521c065f8 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffffa04951a1>] ? libcfs_debug_msg+0x41/0x50 [libcfs] [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff811996a4>] __link_path_walk+0x7a4/0x1000 [<ffffffffa1269000>] ? return_if_equal+0x0/0x30 [lustre] [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff81226d56>] ? security_file_alloc+0x16/0x20 [<ffffffff8119b8a4>] do_filp_open+0x104/0xd20 [<ffffffff810ec53e>] ? call_rcu+0xe/0x10 [<ffffffff811a28ef>] ? d_free+0x3f/0x60 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff8100c715>] ? math_state_restore+0x45/0x60 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b INFO: task ls:26723 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ls D 0000000000000006 0 26723 11886 0x00000080 ffff880813d51b58 0000000000000086 0000004b00000000 ffffffffa12e7983 0000000000000098 0020000000000080 5491c0bf00000006 00000000000b1e5f ffff880763c7dab8 ffff880813d51fd8 000000000000fbc8 ffff880763c7dab8 Call Trace: [<ffffffff8152a5be>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff811a4148>] ? __d_lookup+0xd8/0x150 [<ffffffff8152a45b>] mutex_lock+0x2b/0x50 [<ffffffff811989ab>] do_lookup+0x11b/0x230 [<ffffffff81199100>] __link_path_walk+0x200/0x1000 [<ffffffffa1269000>] ? return_if_equal+0x0/0x30 [lustre] [<ffffffff8119a1ba>] path_walk+0x6a/0xe0 [<ffffffff8119a3cb>] filename_lookup+0x6b/0xc0 [<ffffffff81226d56>] ? security_file_alloc+0x16/0x20 [<ffffffff8119b8a4>] do_filp_open+0x104/0xd20 [<ffffffff810ec53e>] ? call_rcu+0xe/0x10 [<ffffffff811a28ef>] ? d_free+0x3f/0x60 [<ffffffff8128f83a>] ? strncpy_from_user+0x4a/0x90 [<ffffffff811a8b82>] ? alloc_fd+0x92/0x160 [<ffffffff81185be9>] do_sys_open+0x69/0x140 [<ffffffff81185d00>] sys_open+0x20/0x30 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b LustreError: 15736:0:(file.c:3036:ll_migrate()) scratch: migrate 10 , but fid [0x0:0x0:0x0] is insane LustreError: 15736:0:(file.c:3036:ll_migrate()) Skipped 2 previous similar messages LustreError: 19311:0:(lmv_intent.c:239:lmv_revalidate_slaves()) scratch-clilmv-ffff880806604400: nlink 0 < 2 corrupt stripe 0 [0x380000405:0x1f60:0x0]:[0x380000405:0x1f60:0x0] LustreError: 19311:0:(llite_lib.c:2399:ll_prep_inode()) new_inode -fatal: rc -5 LustreError: 20039:0:(lmv_intent.c:239:lmv_revalidate_slaves()) scratch-clilmv-ffff880806604400: nlink 0 < 2 corrupt stripe 0 [0x3c0000402:0x1d9b:0x0]:[0x3c0000402:0x1d9b:0x0] LustreError: 20039:0:(lmv_intent.c:239:lmv_revalidate_slaves()) Skipped 1 previous similar message LustreError: 20039:0:(llite_lib.c:2399:ll_prep_inode()) new_inode -fatal: rc -5 LustreError: 20039:0:(llite_lib.c:2399:ll_prep_inode()) Skipped 1 previous similar message
On the second MDS, I see the following migrate errors and call trace in demsg:
LustreError: 9049:0:(mdt_reint.c:1523:mdt_reint_migrate_internal()) scratch-MDT0 001: parent [0x3c0000400:0x1:0x0] is still on the same MDT, which should be migr ated first: rc = -1 LustreError: 9049:0:(mdt_reint.c:1523:mdt_reint_migrate_internal()) Skipped 14 p revious similar messages LustreError: 8154:0:(mdt_reint.c:1160:mdt_reint_link()) scratch-MDT0001: source inode [0x380000405:0x1b7a:0x0] on remote MDT from [0x3c0000402:0x1917:0x0] LustreError: 8154:0:(mdt_reint.c:1160:mdt_reint_link()) Skipped 52 previous simi lar messages LustreError: 9052:0:(mdt_reint.c:1514:mdt_reint_migrate_internal()) scratch-MDT0 001: source [0x380000404:0x1d4f:0x0] is on the remote MDT LustreError: 9052:0:(mdt_reint.c:1514:mdt_reint_migrate_internal()) Skipped 98 p revious similar messages LustreError: 9040:0:(mdd_dir.c:4021:mdd_migrate()) scratch-MDD0001: [0x3c0000402 :0x18b7:0x0]16 is already opened count 1: rc = -16 LustreError: 9040:0:(mdd_dir.c:4021:mdd_migrate()) Skipped 19 previous similar m essages LustreError: 9040:0:(mdt_open.c:1580:mdt_cross_open()) scratch-MDT0001: [0x3c000 0401:0x1aa4:0x0] doesn't exist!: rc = -14 LustreError: 9040:0:(mdt_open.c:1580:mdt_cross_open()) Skipped 7 previous simila r messages INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Lustre: 9050:0:(service.c:1335:ptlrpc_at_send_early_reply()) @@@ Couldn't add an y time (5/5), not sending early reply req@ffff880543aa9c80 x1487703048481864/t0(0) o36->c82a75ed-84d9-3fb3-4192-c864 a27ef414@192.168.2.113@o2ib:527/0 lens 488/3128 e 24 to 0 dl 1418838807 ref 2 fl Interpret:/0/0 rc 0/0 INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Lustre: scratch-MDT0001: Client c82a75ed-84d9-3fb3-4192-c864a27ef414 (at 192.168 .2.113@o2ib) reconnecting INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20 Lustre: scratch-MDT0001: Client c82a75ed-84d9-3fb3-4192-c864a27ef414 (at 192.168 .2.113@o2ib) reconnecting INFO: task mdt01_014:9058 blocked for more than 120 seconds. Not tainted 2.6.32-431.29.2.el6_lustre.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mdt01_014 D 0000000000000002 0 9058 2 0x00000080 ffff880f67b45a90 0000000000000046 0000000000000000 ffff88053ece9300 ffff88053ece9300 ffff88055da93000 ffff880f67b45a90 ffffffffa08b8b29 ffff880b188de638 ffff880f67b45fd8 000000000000fbc8 ffff880b188de638 Call Trace: [<ffffffffa08b8b29>] ? lu_object_find_try+0x99/0x2b0 [obdclass] [<ffffffffa08b8d75>] lu_object_find_at+0x35/0x100 [obdclass] [<ffffffff81061d00>] ? default_wake_function+0x0/0x20 [<ffffffffa08b8e56>] lu_object_find+0x16/0x20 [obdclass] [<ffffffffa145d066>] mdt_object_find+0x56/0x170 [mdt] [<ffffffffa1466272>] mdt_object_find_lock+0x42/0x170 [mdt] [<ffffffffa14840d8>] mdt_lock_slaves+0x228/0x520 [mdt] [<ffffffffa1485fb3>] mdt_reint_unlink+0x8c3/0x10c0 [mdt] [<ffffffffa08d5880>] ? lu_ucred+0x20/0x30 [obdclass] [<ffffffffa145bed5>] ? mdt_ucred+0x15/0x20 [mdt] [<ffffffffa147c09d>] mdt_reint_rec+0x5d/0x200 [mdt] [<ffffffffa146018b>] mdt_reint_internal+0x4cb/0x7a0 [mdt] [<ffffffffa14609eb>] mdt_reint+0x6b/0x120 [mdt] [<ffffffffa0ee0ade>] tgt_request_handle+0x6fe/0xaf0 [ptlrpc] [<ffffffffa0e90411>] ptlrpc_main+0xe41/0x1950 [ptlrpc] [<ffffffffa0e8f5d0>] ? ptlrpc_main+0x0/0x1950 [ptlrpc] [<ffffffff8109abf6>] kthread+0x96/0xa0 [<ffffffff8100c20a>] child_rip+0xa/0x20 [<ffffffff8109ab60>] ? kthread+0x0/0xa0 [<ffffffff8100c200>] ? child_rip+0x0/0x20