Context: - dumped at 15:46 - oldest UN PID is 4476 - PID 4476 has not run since 150 minutes - 15:46 - 2:30 ~= 13:15 What happenned on node70? [root@bcluster70 ~] # sar -B -f /var/log/sa/sa30 -s 12:50:00 -e 16:00:00 Linux 2.6.32-431.11.2.el6.Bull.48.x86_64 (bcluster70) 06/30/2014 _x86_64_ (32 CPU) 12:50:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff 01:00:02 PM 159.18 120.69 190486.62 3.45 225190.65 0.00 88037.12 87919.90 99.87 01:10:01 PM 75.92 118.79 257835.25 1.41 257221.68 0.00 78674.23 78585.44 99.89 01:20:01 PM 191.88 124.79 160906.34 3.48 189990.06 0.00 71480.53 71382.62 99.86 01:30:01 PM 9.58 107.20 30189.19 0.16 22490.14 0.00 2424.68 2424.68 100.00 01:40:02 PM 14.68 112.85 34474.59 0.35 14503.42 0.00 0.00 0.00 0.00 01:50:01 PM 10.44 302.91 33629.12 1.62 19336.22 0.00 72.21 72.21 100.00 02:00:01 PM 1.36 1063.07 50896.29 0.04 42653.88 0.00 1246.40 1245.89 99.96 02:10:01 PM 29.56 129.78 46422.92 0.95 32272.22 0.00 2785.40 2753.73 98.86 02:20:01 PM 10.75 127.99 47297.97 0.76 26914.32 0.00 2402.79 2402.69 100.00 02:30:01 PM 46.57 428.33 51040.43 2.04 29778.30 0.00 174.74 174.74 100.00 02:40:01 PM 1.87 200.33 27669.17 0.05 16875.71 0.00 484.48 483.98 99.90 02:50:01 PM 200.06 319.16 35270.07 3.66 17618.92 0.00 88.23 88.23 100.00 03:00:02 PM 15.78 908.53 49926.99 0.12 33309.13 0.00 367.66 367.66 100.00 03:10:01 PM 18.20 315.83 21532.19 0.40 12519.19 0.00 0.00 0.00 0.00 03:20:01 PM 5.86 382.33 31279.96 0.11 14396.42 0.00 0.00 0.00 0.00 03:30:01 PM 16.43 75.50 4381.86 0.11 21120.70 0.00 0.00 0.00 0.00 03:40:01 PM 2.54 90.82 14847.44 0.21 17023.96 0.00 1782.50 1127.40 63.25 Average: 47.69 290.46 63887.46 1.11 58278.61 0.00 14648.20 14590.02 99.60 [root@bcluster70 ~] # sar -R -f /var/log/sa/sa30 -s 12:50:00 -e 16:00:00 Linux 2.6.32-431.11.2.el6.Bull.48.x86_64 (bcluster70) 06/30/2014 _x86_64_ (32 CPU) 12:50:01 PM frmpg/s bufpg/s campg/s 01:00:02 PM -24842.36 -0.06 19596.39 01:10:01 PM 11187.72 -0.14 -9663.43 01:20:01 PM 3797.55 0.81 -3103.36 01:30:01 PM 142.64 0.41 41.76 01:40:02 PM -452.35 0.68 58.40 01:50:01 PM -3829.83 0.51 3240.78 02:00:01 PM -55.99 0.26 302.72 02:10:01 PM -8.49 0.47 -196.91 02:20:01 PM -759.88 -0.59 383.66 02:30:01 PM -223.85 0.26 354.95 02:40:01 PM -67.58 0.14 9.01 02:50:01 PM 744.00 0.27 -805.21 03:00:02 PM 445.11 1.35 -287.55 03:10:01 PM 788.22 0.93 -785.55 03:20:01 PM -2119.49 0.85 1973.03 03:30:01 PM 13664.36 0.19 -11312.75 03:40:01 PM -1496.00 0.18 1023.92 Average: -174.68 0.38 43.76 console log: ----8< ---- Lustre: DEBUG MARKER: Mon Jun 30 13:10:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 13:15:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 13:20:01 2014 INFO: task sshd_ext:16289 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. sshd_ext D 0000000000000008 0 16289 16267 0x00000080 ffff88160e11d538 0000000000000086 0000000000000000 ffffffffa06883c7 ffff8804e4667e00 0000000000000026 ffff88160e11d4c8 ffffffffa06898d4 ffff88151e85f060 ffff88160e11dfd8 000000000000fc08 ffff88151e85f060 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc] [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ? mdc_null_inode+0xb2/0x1d0 [mdc] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_clear_inode+0x18a/0x9a0 [lustre] [] clear_inode+0xac/0x140 [] dispose_list+0x40/0x120 [] shrink_icache_memory+0x274/0x2e0 [] shrink_slab+0x12a/0x1a0 [] zone_reclaim+0x3ae/0x650 [] get_page_from_freelist+0x6ac/0x870 [] ? do_select+0x5f5/0x6c0 [] __alloc_pages_nodemask+0x113/0x8d0 [] ? pollwake+0x0/0x60 [] ? __kmalloc_node+0x4d/0x60 [] ? __alloc_skb+0x7a/0x180 [] alloc_pages_current+0xaa/0x110 [] tcp_sendmsg+0x677/0xa20 [] sock_aio_write+0x19b/0x1c0 [] do_sync_write+0xfa/0x140 [] ? autoremove_wake_function+0x0/0x40 [] ? selinux_file_permission+0xbf/0x150 [] ? security_file_permission+0x16/0x20 [] vfs_write+0x184/0x1a0 [] sys_write+0x51/0x90 [] ? __audit_syscall_exit+0x25e/0x290 [] system_call_fastpath+0x16/0x1b INFO: task sshd_ext:27492 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. sshd_ext D 000000000000000f 0 27492 27473 0x00000080 ffff88074574d5f8 0000000000000082 ffff88074574d5a8 ffffffffa06883c7 ffff881ffbf79c00 0000000000000026 ffffffff8100bb8e ffff88074574d5f8 ffff880590ff5060 ffff88074574dfd8 000000000000fc08 ffff880590ff5060 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? apic_timer_interrupt+0xe/0x20 [] ? mutex_spin_on_owner+0x9b/0xc0 [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ? mdc_null_inode+0xb2/0x1d0 [mdc] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_clear_inode+0x19e/0x9a0 [lustre] [] clear_inode+0xac/0x140 [] dispose_list+0x40/0x120 [] shrink_icache_memory+0x274/0x2e0 [] shrink_slab+0x12a/0x1a0 [] zone_reclaim+0x3ae/0x650 [] get_page_from_freelist+0x6ac/0x870 [] ? pollwake+0x0/0x60 [] __alloc_pages_nodemask+0x113/0x8d0 [] ? fair_enqueue_task_fair+0x198/0x440 [] ? sched_clock+0x9/0x10 [] ? enqueue_task+0x66/0x80 [] alloc_pages_current+0xaa/0x110 [] pipe_write+0x3b4/0x6a0 [] do_sync_write+0xfa/0x140 [] ? autoremove_wake_function+0x0/0x40 [] ? selinux_file_permission+0xbf/0x150 [] ? security_file_permission+0x16/0x20 [] vfs_write+0xb8/0x1a0 [] sys_write+0x51/0x90 [] ? __audit_syscall_exit+0x25e/0x290 [] system_call_fastpath+0x16/0x1b INFO: task trjcat_mpi:4917 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. trjcat_mpi D 0000000000000007 0 4917 17562 0x00000080 ffff880692969588 0000000000000086 0000000000000000 ffffffffa06883c7 ffff881eae943e00 0000000000000026 ffffffff8100bb8e ffff880692969588 ffff880650e150e0 ffff880692969fd8 000000000000fc08 ffff880650e150e0 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? apic_timer_interrupt+0xe/0x20 [] ? mutex_spin_on_owner+0x9b/0xc0 [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ? mdc_null_inode+0xb2/0x1d0 [mdc] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_clear_inode+0x19e/0x9a0 [lustre] [] clear_inode+0xac/0x140 [] dispose_list+0x40/0x120 [] shrink_icache_memory+0x274/0x2e0 [] shrink_slab+0x12a/0x1a0 [] zone_reclaim+0x3ae/0x650 [] get_page_from_freelist+0x6ac/0x870 [] __alloc_pages_nodemask+0x113/0x8d0 [] ? __alloc_pages_nodemask+0x113/0x8d0 [] alloc_pages_vma+0x9a/0x150 [] handle_pte_fault+0x7ac/0xbd0 [] ? alloc_pages_current+0xaa/0x110 [] ? pte_alloc_one+0x37/0x50 [] handle_mm_fault+0x22a/0x300 [] __do_page_fault+0x138/0x480 [] ? cfs_mem_cache_free+0xe/0x10 [libcfs] [] ? cl_env_put+0x20f/0x370 [obdclass] [] ? ll_file_read+0x184/0x2a0 [lustre] [] do_page_fault+0x3e/0xa0 [] page_fault+0x25/0x30 INFO: task MATLAB:10440 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. MATLAB D 0000000000000006 0 10440 14857 0x00000080 ffff880d15579bc8 0000000000000086 ffff880d15579b78 ffffffffa06883c7 ffff881412d7da00 0000000000000026 ffff880d15579b58 ffffffffa06898d4 ffff880d1a69dae0 ffff880d15579fd8 000000000000fc08 ffff880d1a69dae0 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc] [] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [] ? __req_capsule_get+0x166/0x700 [ptlrpc] [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_md_close+0x21a/0x6b0 [lustre] [] ? ll_stats_ops_tally+0x6b/0xd0 [lustre] [] ll_file_release+0x11b/0x3c0 [lustre] [] ll_dir_release+0xdb/0xf0 [lustre] [] __fput+0xf5/0x210 [] fput+0x25/0x30 [] filp_close+0x5d/0x90 [] sys_close+0xa5/0x100 [] system_call_fastpath+0x16/0x1b INFO: task sshd_ext:26063 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. sshd_ext D 000000000000000b 0 26063 25696 0x00000080 ffff88041d1015f8 0000000000000086 0000000000000000 ffffffffa06883c7 ffff881f2c86fe00 0000000000000026 ffffffff8100bb8e ffff88041d1015f8 ffff8805d2c27b20 ffff88041d101fd8 000000000000fc08 ffff8805d2c27b20 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? apic_timer_interrupt+0xe/0x20 [] ? mutex_spin_on_owner+0x9b/0xc0 [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ? mdc_null_inode+0xb2/0x1d0 [mdc] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_clear_inode+0x19e/0x9a0 [lustre] [] clear_inode+0xac/0x140 [] dispose_list+0x40/0x120 [] shrink_icache_memory+0x274/0x2e0 [] shrink_slab+0x12a/0x1a0 [] zone_reclaim+0x3ae/0x650 [] get_page_from_freelist+0x6ac/0x870 [] ? pollwake+0x0/0x60 [] __alloc_pages_nodemask+0x113/0x8d0 [] ? avc_has_perm+0x71/0x90 [] alloc_pages_current+0xaa/0x110 [] pipe_write+0x3b4/0x6a0 [] do_sync_write+0xfa/0x140 [] ? autoremove_wake_function+0x0/0x40 [] ? selinux_file_permission+0xbf/0x150 [] ? security_file_permission+0x16/0x20 [] vfs_write+0xb8/0x1a0 [] sys_write+0x51/0x90 [] ? __audit_syscall_exit+0x25e/0x290 [] system_call_fastpath+0x16/0x1b INFO: task rsync:26154 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rsync D 0000000000000008 0 26154 26065 0x00000080 ffff880286417c68 0000000000000082 0000000000000000 0000000000000002 ffff880000118dd8 000000020000001f ffff88087c88abf0 0000000000000030 ffff88010cfbc660 ffff880286417fd8 000000000000fc08 ffff88010cfbc660 Call Trace: [] __mutex_lock_slowpath+0x13e/0x180 [] ? avc_has_perm+0x71/0x90 [] mutex_lock+0x2b/0x50 [] pipe_read+0x7c/0x4e0 [] ? inode_has_perm+0x54/0xa0 [] ? copy_user_generic+0xe/0x20 [] ? set_fd_set+0x49/0x60 [] ? core_sys_select+0x1ec/0x2c0 [] do_sync_read+0xfa/0x140 [] ? autoremove_wake_function+0x0/0x40 [] ? selinux_file_permission+0xbf/0x150 [] ? security_file_permission+0x16/0x20 [] vfs_read+0xb5/0x1a0 [] sys_read+0x51/0x90 [] ? __audit_syscall_exit+0x25e/0x290 [] system_call_fastpath+0x16/0x1b INFO: task code1:4476 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. code1 D 0000000000000016 0 4476 23991 0x00000080 ffff8803e5a2fbe8 0000000000000086 ffff8803e5a2fb98 ffffffffa06883c7 ffff8817b4905200 0000000000000026 ffffffff8100bb8e ffff8803e5a2fbe8 ffff8807dcd8e5e0 ffff8803e5a2ffd8 000000000000fc08 ffff8807dcd8e5e0 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? apic_timer_interrupt+0xe/0x20 [] ? mutex_spin_on_owner+0x9b/0xc0 [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_md_close+0x21a/0x6b0 [lustre] [] ? down_read+0x16/0x30 [] ll_file_release+0x11b/0x3c0 [lustre] [] __fput+0xf5/0x210 [] fput+0x25/0x30 [] filp_close+0x5d/0x90 [] sys_close+0xa5/0x100 [] system_call_fastpath+0x16/0x1b INFO: task ccache:4878 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ccache D 0000000000000007 0 4878 4875 0x00000080 ffff88158adf5be8 0000000000000086 ffff88158adf5b98 ffffffffa06883c7 ffff881a6dae4800 0000000000000026 ffffffff8100bb8e ffff88158adf5be8 ffff8815bb9805e0 ffff88158adf5fd8 000000000000fc08 ffff8815bb9805e0 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? apic_timer_interrupt+0xe/0x20 [] ? mutex_spin_on_owner+0x9b/0xc0 [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_md_close+0x21a/0x6b0 [lustre] [] ? down_read+0x16/0x30 [] ll_file_release+0x11b/0x3c0 [lustre] [] __fput+0xf5/0x210 [] fput+0x25/0x30 [] filp_close+0x5d/0x90 [] sys_close+0xa5/0x100 [] system_call_fastpath+0x16/0x1b INFO: task code2:5036 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. code2 D 0000000000000006 0 5036 5028 0x00000080 ffff8818cf953be8 0000000000000086 ffff8818cf953b98 ffffffffa06883c7 ffff88161647bc00 0000000000000026 ffff8818cf953b78 ffffffffa06898d4 ffff881ae30866a0 ffff8818cf953fd8 000000000000fc08 ffff881ae30866a0 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? lustre_msg_set_timeout+0x74/0xc0 [ptlrpc] [] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [] ? __req_capsule_get+0x166/0x700 [ptlrpc] [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_md_close+0x21a/0x6b0 [lustre] [] ? down_read+0x16/0x30 [] ll_file_release+0x11b/0x3c0 [lustre] [] __fput+0xf5/0x210 [] fput+0x25/0x30 [] filp_close+0x5d/0x90 [] sys_close+0xa5/0x100 [] system_call_fastpath+0x16/0x1b INFO: task code2:5129 blocked for more than 120 seconds. Tainted: G W --------------- 2.6.32-431.11.2.el6.Bull.48.x86_64 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. code2 D 000000000000000a 0 5129 5109 0x00000080 ffff880d54e61be8 0000000000000086 ffff880d54e61b98 ffffffffa06883c7 ffff8818263cdc00 0000000000000026 ffffffff8100bb8e ffff880d54e61be8 ffff880997af0660 ffff880d54e61fd8 000000000000fc08 ffff880997af0660 Call Trace: [] ? lustre_pack_request+0xb7/0x180 [ptlrpc] [] ? apic_timer_interrupt+0xe/0x20 [] ? mutex_spin_on_owner+0x8d/0xc0 [] __mutex_lock_slowpath+0x13e/0x180 [] mutex_lock+0x2b/0x50 [] mdc_close+0x19b/0x980 [mdc] [] lmv_close+0x328/0x5c0 [lmv] [] ll_close_inode_openhandle+0x30e/0x1040 [lustre] [] ll_md_real_close+0x1aa/0x220 [lustre] [] ll_md_close+0x21a/0x6b0 [lustre] [] ? down_read+0x16/0x30 [] ll_file_release+0x11b/0x3c0 [lustre] [] __fput+0xf5/0x210 [] fput+0x25/0x30 [] filp_close+0x5d/0x90 [] sys_close+0xa5/0x100 [] system_call_fastpath+0x16/0x1b Lustre: DEBUG MARKER: Mon Jun 30 13:25:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 13:30:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 13:35:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 13:40:02 2014 Lustre: DEBUG MARKER: Mon Jun 30 13:45:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 13:50:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 13:55:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 14:00:01 2014 epicea_gui_9.29[30287]: segfault at 181 ip 0000000000000181 sp 00007fff456e7bb8 error 14 in epicea_gui_9.29.08[400000+10e1000] Lustre: DEBUG MARKER: Mon Jun 30 14:05:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 14:10:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 14:15:02 2014 Lustre: DEBUG MARKER: Mon Jun 30 14:20:01 2014 Lustre: DEBUG MARKER: Mon Jun 30 14:25:01 2014 ----8< ---- 10 tasks were blocked for more than 120 seconds between 13:20:01 and 13:25:01. The PID 4476 is in the list. We are looking for the lock's owner. crash> bt -f 4476 PID: 4476 TASK: ffff8807dcd8e040 CPU: 22 COMMAND: "code1" #0 [ffff8803e5a2fb28] schedule at ffffffff81528a52 ffff8803e5a2fb30: 0000000000000086 ffff8803e5a2fb98 ffff8803e5a2fb40: ffffffffa06883c7 ffff8817b4905200 ffff8803e5a2fb50: 0000000000000026 ffffffff8100bb8e ffff8803e5a2fb60: ffff8803e5a2fbe8 ffff8807dcd8e5e0 ffff8803e5a2fb70: ffff8803e5a2ffd8 000000000000fc08 ffff8803e5a2fb80: ffff8807dcd8e5e0 ffff881871180a80 ffff8803e5a2fb90: ffff8807dcd8e040 0000000000000000 ffff8803e5a2fba0: ffff8803e5a2ffd8 ffff88106f600140 ffff8803e5a2fbb0: ffffffffffffff10 ffffffff810554cb ffff8803e5a2fbc0: ffff88106f600140 ffff8807dcd8e040 ffff8803e5a2fbd0: ffff88106f600144 ffff88106f600148 ffff8803e5a2fbe0: ffffffffffffffff ffff8803e5a2fc58 ffff8803e5a2fbf0: ffffffff8152a20e #1 [ffff8803e5a2fbf0] __mutex_lock_slowpath at ffffffff8152a20e ffff8803e5a2fbf8: ffff8803e5a2ffd8 ffff8807dcd8e040 ffff8803e5a2fc08: ffff8816dda9dc08 ffff88106f600148 ffff8803e5a2fc18: ffff8807dcd8e040 ffff8817b49052e8 ffff8803e5a2fc28: ffff8803e5a2fc48 ffff88106f600140 ffff8803e5a2fc38: ffff88185cd506a0 ffff88106f600140 ffff8803e5a2fc48: ffff88106f6d4538 ffff881418561200 ffff8803e5a2fc58: ffff8803e5a2fc78 ffffffff8152a0ab #2 [ffff8803e5a2fc60] mutex_lock at ffffffff8152a0ab ffff8803e5a2fc68: ffff881340d27800 ffff88185cd506a0 ffff8803e5a2fc78: ffff8803e5a2fcc8 ffffffffa09176db #3 [ffff8803e5a2fc80] mdc_close at ffffffffa09176db [mdc] ffff8803e5a2fc88: ffff881340d27ba0 ffff8803e5a2fd60 ffff8803e5a2fc98: ffff8803e5a2fce8 ffff881418561200 ffff8803e5a2fca8: ffff8803e5a2fd60 ffff88185cd506a0 ffff8803e5a2fcb8: ffff88086f0b9c00 ffff88185cd506a0 <=== obd_export: 0xffff88086f0b9c00 ffff8803e5a2fcc8: ffff8803e5a2fd18 ffffffffa0b9bcb8 #4 [ffff8803e5a2fcd0] lmv_close at ffffffffa0b9bcb8 [lmv] ffff8803e5a2fcd8: ffff88106f6d24f8 00000000bdc90af8 ffff8803e5a2fce8: ffff880cbdc90af8 ffff8817c976d140 ffff8803e5a2fcf8: ffff88084b729000 ffff880cbdc90af8 ffff8803e5a2fd08: ffff881418561200 ffff88185cd506a0 ffff8803e5a2fd18: ffff8803e5a2fd98 ffffffffa0a80c1e #5 [ffff8803e5a2fd20] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre] ffff8803e5a2fd28: ffff881000000010 0000000000000000 ffff8803e5a2fd38: ffff880cbdc90a48 ffff8803e5a2fe38 ffff8803e5a2fd48: 00000000e5a2fdc8 ffff88084b729000 ffff8803e5a2fd58: ffff8817c976d140 0000000000000000 ffff8803e5a2fd68: ffff8803e5a2fd98 ffff880cbdc90af8 ffff8803e5a2fd78: ffff880cbdc90ad8 ffff8817c976d140 ffff8803e5a2fd88: ffff880cbdc90aa8 ffff880cbdc90af8 ffff8803e5a2fd98: ffff8803e5a2fdc8 ffffffffa0a81afa #6 [ffff8803e5a2fda0] ll_md_real_close at ffffffffa0a81afa [lustre] ffff8803e5a2fda8: ffff881864e67900 ffff880cbdc90af8 ffff8803e5a2fdb8: ffff881348ead380 ffff88084b729000 ffff8803e5a2fdc8: ffff8803e5a2fe78 ffffffffa0a81d8a #7 [ffff8803e5a2fdd0] ll_md_close at ffffffffa0a81d8a [lustre] ffff8803e5a2fdd8: ffff8803e5a2fe38 ffff881321cc9920 ffff8803e5a2fde8: ffff8803e5a2ff48 ffff8803e5a2fe08 ffff8803e5a2fdf8: ffff880cbdc90ad8 0000001021cc9918 ffff8803e5a2fe08: 0000000000000004 0000000000000000 ffff8803e5a2fe18: 0000000000000000 0000000000000000 ffff8803e5a2fe28: 0000000000000000 0000000000000000 ffff8803e5a2fe38: ffff8803e5a2fe58 ffffffff8152a796 ffff8803e5a2fe48: ffff8803e5a2fe78 ffff880cbdc90af8 ffff8803e5a2fe58: ffff881864e67900 ffff88084b602000 ffff8803e5a2fe68: ffff880cbdc90a00 ffff881348ead380 ffff8803e5a2fe78: ffff8803e5a2feb8 ffffffffa0a8233b #8 [ffff8803e5a2fe80] ll_file_release at ffffffffa0a8233b [lustre] ffff8803e5a2fe88: 0000000000000004 ffff881864e67900 ffff8803e5a2fe98: 0000000000000010 ffff880cbdc90af8 ffff8803e5a2fea8: ffff880c4a8f7a80 ffff88087ab437c0 ffff8803e5a2feb8: ffff8803e5a2ff08 ffffffff8118ad55 #9 [ffff8803e5a2fec0] __fput at ffffffff8118ad55 ffff8803e5a2fec8: ffff880cbdc90af8 ffff880c4a8f7a80 ffff8803e5a2fed8: ffff8803e5a2fef8 ffff881864e67900 ffff8803e5a2fee8: ffff88087a84d440 0000000000000000 ffff8803e5a2fef8: 00000000028f2674 fffffffffffe346d ffff8803e5a2ff08: ffff8803e5a2ff18 ffffffff8118ae95 #10 [ffff8803e5a2ff10] fput at ffffffff8118ae95 ffff8803e5a2ff18: ffff8803e5a2ff48 ffffffff811861bd #11 [ffff8803e5a2ff20] filp_close at ffffffff811861bd ffff8803e5a2ff28: ffff880525881248 ffff88087a84d440 ffff8803e5a2ff38: 0000000000000006 ffff88087a84d4c0 ffff8803e5a2ff48: ffff8803e5a2ff78 ffffffff81186295 #12 [ffff8803e5a2ff50] sys_close at ffffffff81186295 ffff8803e5a2ff58: 00000000028f2690 0000000004d79f70 ffff8803e5a2ff68: 00000000028f2690 00000000028f2624 ffff8803e5a2ff78: 00007fffb30fab90 ffffffff8100b072 #13 [ffff8803e5a2ff80] system_call_fastpath at ffffffff8100b072 RIP: 00002b43df89b6d0 RSP: 00007fffb30fa6c0 RFLAGS: 00000206 RAX: 0000000000000003 RBX: ffffffff8100b072 RCX: 0000000004ccfdb0 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000006 RBP: 00007fffb30fab90 R8: 0000000004f531e0 R9: 0000000000000001 R10: 0000000000000020 R11: 0000000000000246 R12: 00000000028f2624 R13: 00000000028f2690 R14: 0000000004d79f70 R15: 00000000028f2690 ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b The address of obd_export is found with the help of 'dis lmv_close' and 'dis mdc_close'. The lock is get with mdc_get_rpc_lock(obd->u.cli.cl_close_lock, NULL) in mdc_close(). We can now go through the obd_export to find the lock's owner. crash> obd_export.exp_obd ffff88086f0b9c00 exp_obd = 0xffff88106f6d4538 crash> struct obd_device.u 0xffff88106f6d4538 |grep cl_close_lock cl_close_lock = 0xffff88106f600140, crash> mdc_rpc_lock.rpcl_mutex 0xffff88106f600140 rpcl_mutex = { count = { counter = -1 }, wait_lock = { raw_lock = { slock = 1627873543 } }, wait_list = { next = 0xffff8803e5a2fc08, prev = 0xffff8807e0e65be8 }, owner = 0xffff88171cb42000 } crash> thread_info 0xffff88171cb42000 struct thread_info { task = 0xffff881518308b00, exec_domain = 0xffffffff81a9bce0, flags = 128, status = 0, cpu = 2, preempt_count = 0, addr_limit = { seg = 140737488351232 }, restart_block = { fn = 0xffffffff81085760 , { { arg0 = 0, arg1 = 0, arg2 = 0, arg3 = 0 }, futex = { uaddr = 0x0, val = 0, flags = 0, bitset = 0, time = 0, uaddr2 = 0x0 }, nanosleep = { index = 0, rmtp = 0x0, compat_rmtp = 0x0, expires = 0 }, poll = { ufds = 0x0, nfds = 0, has_timeout = 0, tv_sec = 0, tv_nsec = 0 } } }, sysenter_return = 0x0, uaccess_err = 0 } crash> thread_info.task 0xffff88171cb42000 task = 0xffff881518308b00 crash> task 0xffff881518308b00 |grep PID PID: 5231 TASK: ffff881518308b00 CPU: 2 COMMAND: "code2" What is this process doing ? crash> bt 5231 PID: 5231 TASK: ffff881518308b00 CPU: 2 COMMAND: "code2" #0 [ffff88171cb43188] schedule at ffffffff81528a52 #1 [ffff88171cb43250] __mutex_lock_slowpath at ffffffff8152a20e #2 [ffff88171cb432c0] mutex_lock at ffffffff8152a0ab <=== Requires a new lock #3 [ffff88171cb432e0] mdc_close at ffffffffa09176db [mdc] #4 [ffff88171cb43330] lmv_close at ffffffffa0b9bcb8 [lmv] #5 [ffff88171cb43380] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre] #6 [ffff88171cb43400] ll_md_real_close at ffffffffa0a81afa [lustre] #7 [ffff88171cb43430] ll_clear_inode at ffffffffa0a92dee [lustre] #8 [ffff88171cb43470] clear_inode at ffffffff811a626c #9 [ffff88171cb43490] dispose_list at ffffffff811a6340 #10 [ffff88171cb434d0] shrink_icache_memory at ffffffff811a6694 #11 [ffff88171cb43530] shrink_slab at ffffffff81138b7a #12 [ffff88171cb43590] zone_reclaim at ffffffff8113b77e #13 [ffff88171cb436b0] get_page_from_freelist at ffffffff8112d8dc #14 [ffff88171cb437e0] __alloc_pages_nodemask at ffffffff8112f443 #15 [ffff88171cb43920] alloc_pages_current at ffffffff811680ca #16 [ffff88171cb43950] __vmalloc_area_node at ffffffff81159696 #17 [ffff88171cb439b0] __vmalloc_node at ffffffff8115953d #18 [ffff88171cb43a10] vmalloc at ffffffff8115985c #19 [ffff88171cb43a20] cfs_alloc_large at ffffffffa03b4b1e [libcfs] #20 [ffff88171cb43a30] null_alloc_repbuf at ffffffffa06c4961 [ptlrpc] #21 [ffff88171cb43a60] sptlrpc_cli_alloc_repbuf at ffffffffa06b2355 [ptlrpc] #22 [ffff88171cb43a90] ptl_send_rpc at ffffffffa068432c [ptlrpc] #23 [ffff88171cb43b50] ptlrpc_send_new_req at ffffffffa067879b [ptlrpc] #24 [ffff88171cb43bc0] ptlrpc_set_wait at ffffffffa067ddb6 [ptlrpc] #25 [ffff88171cb43c60] ptlrpc_queue_wait at ffffffffa067e0df [ptlrpc] <=== PID has the lock #26 [ffff88171cb43c80] mdc_close at ffffffffa0917714 [mdc] #27 [ffff88171cb43cd0] lmv_close at ffffffffa0b9bcb8 [lmv] #28 [ffff88171cb43d20] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre] #29 [ffff88171cb43da0] ll_md_real_close at ffffffffa0a81afa [lustre] #30 [ffff88171cb43dd0] ll_md_close at ffffffffa0a81d8a [lustre] #31 [ffff88171cb43e80] ll_file_release at ffffffffa0a8233b [lustre] #32 [ffff88171cb43ec0] __fput at ffffffff8118ad55 #33 [ffff88171cb43f10] fput at ffffffff8118ae95 #34 [ffff88171cb43f20] filp_close at ffffffff811861bd #35 [ffff88171cb43f50] sys_close at ffffffff81186295 #36 [ffff88171cb43f80] system_call_fastpath at ffffffff8100b072 RIP: 00002adaacdf26d0 RSP: 00007fff9665e238 RFLAGS: 00010246 RAX: 0000000000000003 RBX: ffffffff8100b072 RCX: 0000000000002261 RDX: 00000000044a24b0 RSI: 0000000000000001 RDI: 0000000000000005 RBP: 0000000000000000 R8: 00002adaad0ac560 R9: 0000000000000001 R10: 00000000000004fd R11: 0000000000000246 R12: 00000000000004fc R13: 00000000ffffffff R14: 00000000044a23d0 R15: 00000000ffffffff ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b We have a recursive locking here. include/linux/mutex.h: 20 /* 21 * Simple, straightforward mutexes with strict semantics: 22 * 23 * - only one task can hold the mutex at a time 24 * - only the owner can unlock the mutex 25 * - multiple unlocks are not permitted 26 * - recursive locking is not permitted 27 * - a mutex object must be initialized via the API 28 * - a mutex object must not be initialized via memset or copying 29 * - task may not exit with mutex held 30 * - memory areas where held locks reside must not be freed 31 * - held mutexes must not be reinitialized 32 * - mutexes may not be used in hardware or software interrupt 33 * contexts such as tasklets and timers See line 26: recursive locking is not permitted Here is the deadlock! Additionnal output: crash> bt -l 5231 PID: 5231 TASK: ffff881518308b00 CPU: 2 COMMAND: "code2" #0 [ffff88171cb43188] schedule at ffffffff81528a52 /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/kernel/sched.c: 3147 #1 [ffff88171cb43250] __mutex_lock_slowpath at ffffffff8152a20e /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/kernel/mutex.c: 262 #2 [ffff88171cb432c0] mutex_lock at ffffffff8152a0ab /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/arch/x86/include/asm/thread_info.h: 216 #3 [ffff88171cb432e0] mdc_close at ffffffffa09176db [mdc] /usr/src/debug/lustre-2.4.3/libcfs/include/libcfs/libcfs_fail.h: 82 #4 [ffff88171cb43330] lmv_close at ffffffffa0b9bcb8 [lmv] /usr/src/debug/lustre-2.4.3/libcfs/include/libcfs/libcfs_debug.h: 211 #5 [ffff88171cb43380] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre] /usr/src/debug/lustre-2.4.3/libcfs/include/libcfs/libcfs_debug.h: 211 #6 [ffff88171cb43400] ll_md_real_close at ffffffffa0a81afa [lustre] /usr/src/debug/lustre-2.4.3/libcfs/include/libcfs/libcfs_debug.h: 211 #7 [ffff88171cb43430] ll_clear_inode at ffffffffa0a92dee [lustre] /usr/src/debug/lustre-2.4.3/lustre/llite/llite_lib.c: 1262 #8 [ffff88171cb43470] clear_inode at ffffffff811a626c /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/fs/inode.c: 337 #9 [ffff88171cb43490] dispose_list at ffffffff811a6340 /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/fs/inode.c: 366 #10 [ffff88171cb434d0] shrink_icache_memory at ffffffff811a6694 /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/fs/inode.c: 532 #11 [ffff88171cb43530] shrink_slab at ffffffff81138b7a /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/mm/vmscan.c: 274 #12 [ffff88171cb43590] zone_reclaim at ffffffff8113b77e /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/mm/vmscan.c: 3060 #13 [ffff88171cb436b0] get_page_from_freelist at ffffffff8112d8dc /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/mm/page_alloc.c: 1733 #14 [ffff88171cb437e0] __alloc_pages_nodemask at ffffffff8112f443 /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/mm/page_alloc.c: 2353 #15 [ffff88171cb43920] alloc_pages_current at ffffffff811680ca /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/include/linux/cpuset.h: 120 #16 [ffff88171cb43950] __vmalloc_area_node at ffffffff81159696 /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/include/linux/gfp.h: 310 #17 [ffff88171cb439b0] __vmalloc_node at ffffffff8115953d /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/mm/vmalloc.c: 1665 #18 [ffff88171cb43a10] vmalloc at ffffffff8115985c /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/mm/vmalloc.c: 1710 #19 [ffff88171cb43a20] cfs_alloc_large at ffffffffa03b4b1e [libcfs] /usr/src/debug/lustre-2.4.3/libcfs/libcfs/linux/linux-mem.c: 84 #20 [ffff88171cb43a30] null_alloc_repbuf at ffffffffa06c4961 [ptlrpc] /usr/src/debug/lustre-2.4.3/lustre/ptlrpc/sec_null.c: 217 #21 [ffff88171cb43a60] sptlrpc_cli_alloc_repbuf at ffffffffa06b2355 [ptlrpc] /usr/src/debug/lustre-2.4.3/lustre/ptlrpc/sec.c: 1687 #22 [ffff88171cb43a90] ptl_send_rpc at ffffffffa068432c [ptlrpc] /usr/src/debug/lustre-2.4.3/lustre/ptlrpc/niobuf.c: 737 #23 [ffff88171cb43b50] ptlrpc_send_new_req at ffffffffa067879b [ptlrpc] /usr/src/debug/lustre-2.4.3/lustre/ptlrpc/client.c: 1445 #24 [ffff88171cb43bc0] ptlrpc_set_wait at ffffffffa067ddb6 [ptlrpc] /usr/src/debug/lustre-2.4.3/lustre/ptlrpc/client.c: 2083 #25 [ffff88171cb43c60] ptlrpc_queue_wait at ffffffffa067e0df [ptlrpc] /usr/src/debug/lustre-2.4.3/lustre/ptlrpc/client.c: 2619 #26 [ffff88171cb43c80] mdc_close at ffffffffa0917714 [mdc] /usr/src/debug/lustre-2.4.3/lustre/mdc/mdc_request.c: 878 #27 [ffff88171cb43cd0] lmv_close at ffffffffa0b9bcb8 [lmv] /usr/src/debug/lustre-2.4.3/libcfs/include/libcfs/libcfs_debug.h: 211 #28 [ffff88171cb43d20] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre] /usr/src/debug/lustre-2.4.3/libcfs/include/libcfs/libcfs_debug.h: 211 #29 [ffff88171cb43da0] ll_md_real_close at ffffffffa0a81afa [lustre] /usr/src/debug/lustre-2.4.3/libcfs/include/libcfs/libcfs_debug.h: 211 #30 [ffff88171cb43dd0] ll_md_close at ffffffffa0a81d8a [lustre] /usr/src/debug/lustre-2.4.3/lustre/llite/file.c: 282 #31 [ffff88171cb43e80] ll_file_release at ffffffffa0a8233b [lustre] /usr/src/debug/lustre-2.4.3/lustre/llite/file.c: 350 #32 [ffff88171cb43ec0] __fput at ffffffff8118ad55 /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/fs/file_table.c: 254 #33 [ffff88171cb43f10] fput at ffffffff8118ae95 /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/fs/file_table.c: 200 #34 [ffff88171cb43f20] filp_close at ffffffff811861bd /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/fs/open.c: 977 #35 [ffff88171cb43f50] sys_close at ffffffff81186295 /usr/src/debug/kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/fs/open.c: 1007 #36 [ffff88171cb43f80] system_call_fastpath at ffffffff8100b072 /usr/src/debug//////////////////////////////////////////////////////////////////kernel-2.6/linux-2.6.32-431.11.2.el6.Bull.48.x86_64/arch/x86/kernel/entry_64.S: 489 RIP: 00002adaacdf26d0 RSP: 00007fff9665e238 RFLAGS: 00010246 RAX: 0000000000000003 RBX: ffffffff8100b072 RCX: 0000000000002261 RDX: 00000000044a24b0 RSI: 0000000000000001 RDI: 0000000000000005 RBP: 0000000000000000 R8: 00002adaad0ac560 R9: 0000000000000001 R10: 00000000000004fd R11: 0000000000000246 R12: 00000000000004fc R13: 00000000ffffffff R14: 00000000044a23d0 R15: 00000000ffffffff ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b crash> bt -f 5231 PID: 5231 TASK: ffff881518308b00 CPU: 2 COMMAND: "code2" #0 [ffff88171cb43188] schedule at ffffffff81528a52 ffff88171cb43190: 0000000000000086 ffff88171cb431f8 ffff88171cb431a0: ffffffffa06883c7 ffff88129bfa0200 ffff88171cb431b0: 0000000000000026 ffffffff8100bb8e ffff88171cb431c0: ffff88171cb43248 ffff8815183090a0 ffff88171cb431d0: ffff88171cb43fd8 000000000000fc08 ffff88171cb431e0: ffff8815183090a0 ffff88187b76a0c0 ffff88171cb431f0: ffff881518308b00 ffff88171cb42000 ffff88171cb43200: ffff88171cb43fd8 ffff88106f600140 ffff88171cb43210: ffffffffffffff10 ffffffff810554cf ffff88171cb43220: ffff88106f600140 ffff881518308b00 ffff88171cb43230: ffff88106f600144 ffff88106f600148 ffff88171cb43240: ffffffffffffffff ffff88171cb432b8 ffff88171cb43250: ffffffff8152a20e #1 [ffff88171cb43250] __mutex_lock_slowpath at ffffffff8152a20e ffff88171cb43258: ffff88171cb43fd8 ffff881518308b00 ffff88171cb43268: ffff88160e11d558 ffff88074574d618 ffff88171cb43278: ffff881518308b00 ffff88129bfa02e8 ffff88171cb43288: ffff88171cb432a8 ffff88106f600140 ffff88171cb43298: ffff880aab486ea0 ffff88106f600140 ffff88171cb432a8: ffff88106f6d4538 ffff8814cd6c2400 ffff88171cb432b8: ffff88171cb432d8 ffffffff8152a0ab #2 [ffff88171cb432c0] mutex_lock at ffffffff8152a0ab ffff88171cb432c8: ffff8817b9d98000 ffff880aab486ea0 ffff88171cb432d8: ffff88171cb43328 ffffffffa09176db #3 [ffff88171cb432e0] mdc_close at ffffffffa09176db [mdc] ffff88171cb432e8: ffff8817b9d983a0 ffff88171cb433c0 ffff88171cb432f8: ffff88171cb43348 ffff8814cd6c2400 ffff88171cb43308: ffff88171cb433c0 ffff880aab486ea0 ffff88171cb43318: ffff88086f0b9c00 ffff880aab486ea0 <=== obd_export: 0xffff88086f0b9c00 ffff88171cb43328: ffff88171cb43378 ffffffffa0b9bcb8 #4 [ffff88171cb43330] lmv_close at ffffffffa0b9bcb8 [lmv] ffff88171cb43338: ffff88106f6d24f8 000000009c00e648 ffff88171cb43348: ffff88149c00e638 ffff880ba436aa00 ffff88171cb43358: ffff88084b729000 ffff88149c00e638 ffff88171cb43368: ffff8814cd6c2400 ffff880aab486ea0 ffff88171cb43378: ffff88171cb433f8 ffffffffa0a80c1e #5 [ffff88171cb43380] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre] ffff88171cb43388: ffff880900000010 0000000000000000 ffff88171cb43398: ffff88171cb433d8 ffffffffa0921562 ffff88171cb433a8: 0000000098bfa8a2 ffff88084b729000 ffff88171cb433b8: ffff880ba436aa00 0000000000000000 ffff88171cb433c8: ffff88106f6d24f8 ffff88149c00e638 ffff88171cb433d8: ffff88149c00e618 ffff880ba436aa00 ffff88171cb433e8: ffff88149c00e5e8 ffff88149c00e648 ffff88171cb433f8: ffff88171cb43428 ffffffffa0a81afa #6 [ffff88171cb43400] ll_md_real_close at ffffffffa0a81afa [lustre] ffff88171cb43408: ffff88149c00e638 ffff88149c00e540 ffff88171cb43418: ffff88084b602000 ffff88084b729000 ffff88171cb43428: ffff88171cb43468 ffffffffa0a92dee #7 [ffff88171cb43430] ll_clear_inode at ffffffffa0a92dee [lustre] ffff88171cb43438: ffff88171cb434e8 ffff88149c00e638 ffff88171cb43448: ffff88149c00e770 0000000000000036 ffff88171cb43458: ffffffff81fd2400 ffff88149c00e648 ffff88171cb43468: ffff88171cb43488 ffffffff811a626c #8 [ffff88171cb43470] clear_inode at ffffffff811a626c ffff88171cb43478: ffff88149c00e638 ffff88171cb434e8 ffff88171cb43488: ffff88171cb434c8 ffffffff811a6340 #9 [ffff88171cb43490] dispose_list at ffffffff811a6340 ffff88171cb43498: ffffffff8152b492 0000000000000080 ffff88171cb434a8: ffff88072d97fab8 0000000000000080 ffff88171cb434b8: ffff88171cb434e8 0000000000000080 ffff88171cb434c8: ffff88171cb43528 ffffffff811a6694 #10 [ffff88171cb434d0] shrink_icache_memory at ffffffff811a6694 ffff88171cb434d8: 000000000000007f ffffffff81fd2400 ffff88171cb434e8: ffff8805cca0bb48 ffff8802e953e498 ffff88171cb434f8: ffff88171cb43518 ffffffff81ad42c0 ffff88171cb43508: 0000000000200358 000000000006e08c ffff88171cb43518: 00000000000200d2 0000000000013680 ffff88171cb43528: ffff88171cb43588 ffffffff81138b7a #11 [ffff88171cb43530] shrink_slab at ffffffff81138b7a ffff88171cb43538: 0000000000000000 0000000000000080 ffff88171cb43548: 0000000000000001 000000000017d018 ffff88171cb43558: 0000000000000000 ffff880000110d80 ffff88171cb43568: 000000000005eb76 0000000000000000 ffff88171cb43578: ffff880000110d80 00000000000200d2 ffff88171cb43588: ffff88171cb436a8 ffffffff8113b77e #12 [ffff88171cb43590] zone_reclaim at ffffffff8113b77e ffff88171cb43598: ffff88171cb435a8 ffff88171cb43618 ffff88171cb435a8: ffff88171cb435f8 ffffffff81054619 ffff88171cb435b8: 0000000000000000 000000c000000000 ffff88171cb435c8: 0000000000000003 ffff881518308b00 ffff88171cb435d8: 0000000000000000 0000000000000001 ffff88171cb435e8: 0000000000000020 0000000200000282 ffff88171cb435f8: ffff881518308b00 000000000005eb77 ffff88171cb43608: 00ffffffa0729040 0000000000004b08 ffff88171cb43618: 0000000000000020 0000000000000020 ffff88171cb43628: 0000000000000020 0000000000000000 ffff88171cb43638: 00000000000200d2 0000000100000000 ffff88171cb43648: 000000000000000a 0000000000000000 ffff88171cb43658: 0000000000000000 0000000000000000 ffff88171cb43668: 00000000000520e7 0000000000000000 ffff88171cb43678: ffff88171cb43778 0000000000000001 ffff88171cb43688: ffff88171cb42000 ffff880000110d80 ffff88171cb43698: 0000000000004b08 0000000000000001 ffff88171cb436a8: ffff88171cb437d8 ffffffff8112d8dc #13 [ffff88171cb436b0] get_page_from_freelist at ffffffff8112d8dc ffff88171cb436b8: ffff881700000041 ffffffffa03c9a2d ffff88171cb436c8: 0000000300000000 dfbdc9cad49166c0 ffff88171cb436d8: ffff88171cb43708 ffff88106f9166c0 ffff88171cb436e8: 0000000200000010 ffff88107b420780 ffff88171cb436f8: 000000401cb43758 0000000000000000 ffff88171cb43708: ffff880000121b08 00000002a03c9542 ffff88171cb43718: 0000000000000000 ffffffffa06b15b8 ffff88171cb43728: 000000001cb43748 ffff880000121b00 ffff88171cb43738: 00000037ffffffc8 0000004100000000 ffff88171cb43748: ffff880000121b08 ffffffffa06b9445 ffff88171cb43758: ffffc90057559000 0000020000000000 ffff88171cb43768: ffff8818f38ad6c0 0000000000000000 ffff88171cb43778: ffff88106f915600 000200d200000000 ffff88171cb43788: ffff88171cb43858 ffffffffa0642720 ffff88171cb43798: ffff88110cbd5240 ffff880000110d80 ffff88171cb437a8: ffff88106f915600 0000000000000000 ffff88171cb437b8: 0000000000000002 ffff880000121b00 ffff88171cb437c8: ffff881518308b00 00000000000000d2 ffff88171cb437d8: ffff88171cb43918 ffffffff8112f443 #14 [ffff88171cb437e0] __alloc_pages_nodemask at ffffffff8112f443 ffff88171cb437e8: ffff880000110d80 ffffffff00000000 ffff88171cb437f8: ffff88171cb43848 ffffffffa03c9a2d ffff88171cb43808: 0000000300000000 dfbdc9cad49166c0 ffff88171cb43818: ffff88171cb43848 ffff88106f9166c0 ffff88171cb43828: 0000000000000010 ffff88107b420780 ffff88171cb43838: ffff88171cb43898 0000000000000002 ffff88171cb43848: ffff88171cb43878 ffffffffa03c9542 ffff88171cb43858: ffff88171cb43898 ffffffff812891cd ffff88171cb43868: ffffffff81fbe2f0 ffff88045425c240 ffff88171cb43878: ffff88045425c258 ffffe8ffffffffff ffff88171cb43888: ffffffffffffffff ffffc90000000000 ffff88171cb43898: 000000101cb438b8 00000000000200d2 ffff88171cb438a8: ffff880000121b08 0000000000000000 ffff88171cb438b8: ffff88171cb43948 ffffffff81158f2a ffff88171cb438c8: ffff88171cb43998 ffff88045425c240 ffff88171cb438d8: ffff880000110d80 ffffffff8117074b ffff88171cb438e8: ffffc90000000000 00000000000000d2 ffff88171cb438f8: 0000000000000000 ffffffff81ac4ee0 ffff88171cb43908: 0000000000000000 00000000000000d2 ffff88171cb43918: ffff88171cb43948 ffffffff811680ca #15 [ffff88171cb43920] alloc_pages_current at ffffffff811680ca ffff88171cb43928: ffff880461353440 00000000ffffffff ffff88171cb43938: 0000000000000000 ffffffffffffffff ffff88171cb43948: ffff88171cb439a8 ffffffff81159696 #16 [ffff88171cb43950] __vmalloc_area_node at ffffffff81159696 ffff88171cb43958: 8000000000000163 00000000000000d2 ffff88171cb43968: ffffe8ffffffffff ffff88050df3e600 ffff88171cb43978: ffff88171cb43988 ffffffffa03b4b1e ffff88171cb43988: 00000000ffffffff 00000000000000d2 ffff88171cb43998: 8000000000000163 ffff880461353440 ffff88171cb439a8: ffff88171cb43a08 ffffffff8115953d #17 [ffff88171cb439b0] __vmalloc_node at ffffffff8115953d ffff88171cb439b8: ffff881b000000d2 ffffffffa03b4b1e ffff88171cb439c8: ffff88171cb439f8 ffffffffa050d932 ffff88171cb439d8: ffff881b80cd3bd8 ffff880683401c00 ffff88171cb439e8: 0000000000008000 0000000000008000 ffff88171cb439f8: 0000000000000000 ffff88106f856340 ffff88171cb43a08: ffff88171cb43a18 ffffffff8115985c #18 [ffff88171cb43a10] vmalloc at ffffffff8115985c ffff88171cb43a18: ffff88171cb43a28 ffffffffa03b4b1e #19 [ffff88171cb43a20] cfs_alloc_large at ffffffffa03b4b1e [libcfs] ffff88171cb43a28: ffff88171cb43a58 ffffffffa06c4961 #20 [ffff88171cb43a30] null_alloc_repbuf at ffffffffa06c4961 [ptlrpc] ffff88171cb43a38: ffff88035182b1e8 ffff880683401c00 ffff88171cb43a48: ffffffffa0758540 0000000000004e50 ffff88171cb43a58: ffff88171cb43a88 ffffffffa06b2355 #21 [ffff88171cb43a60] sptlrpc_cli_alloc_repbuf at ffffffffa06b2355 [ptlrpc] ffff88171cb43a68: ffff88171cb43a88 ffff880683401c00 ffff88171cb43a78: 0000000000000000 ffff88106f6d4538 ffff88171cb43a88: ffff88171cb43b48 ffffffffa068432c #22 [ffff88171cb43a90] ptl_send_rpc at ffffffffa068432c [ptlrpc] ffff88171cb43a98: ffff88171cb43b48 ffffffffa06b89e4 ffff88171cb43aa8: ffff88171cb43ad8 ffff880683401c00 ffff88171cb43ab8: ffff881518308b00 ffff880683401e08 ffff88171cb43ac8: fffffffffffffc18 ffff881518308b00 ffff88171cb43ad8: ffff880683401d30 ffffffffffffffff ffff88171cb43ae8: ffff88171cb43b28 ffff880683401c00 ffff88171cb43af8: ffff880683401c00 0000000000000188 ffff88171cb43b08: ffff880799a98200 ffffffffa07584e0 ffff88171cb43b18: ffff880799a98200 ffff880683401c00 ffff88171cb43b28: ffff88106f912800 ffff88106f912a70 ffff88171cb43b38: ffff88106f6d4538 ffff8805d13f6a00 ffff88171cb43b48: ffff88171cb43bb8 ffffffffa067879b #23 [ffff88171cb43b50] ptlrpc_send_new_req at ffffffffa067879b [ptlrpc] ffff88171cb43b58: ffff88171cb43b78 ffffffffa06898d4 ffff88171cb43b68: ffff880799a98200 0000000000000023 ffff88171cb43b78: ffff88171cb43b98 00000000a0687aac ffff88171cb43b88: ffffffffa072d820 ffff880531371d40 ffff88171cb43b98: ffff880531371d70 ffff880683401ec0 ffff88171cb43ba8: ffff88106f6d4538 ffff8805d13f6a00 ffff88171cb43bb8: ffff88171cb43c58 ffffffffa067ddb6 #24 [ffff88171cb43bc0] ptlrpc_set_wait at ffffffffa067ddb6 [ptlrpc] ffff88171cb43bc8: ffffc90057559000 ffff880600000050 ffff88171cb43bd8: ffffffff00000001 ffff880683401c00 ffff88171cb43be8: ffff880799a98200 0000000000000000 ffff88171cb43bf8: ffff88171cb43c18 00000020a0686d3c ffff88171cb43c08: ffff880799a98200 0000000000000000 ffff88171cb43c18: ffff88171cb43c38 ffffffffa0687d36 ffff88171cb43c28: ffff880531371d40 ffff880683401c00 ffff88171cb43c38: ffff880531371d40 ffff88106f600140 ffff88171cb43c48: ffff88106f6d4538 ffff8805d13f6a00 ffff88171cb43c58: ffff88171cb43c78 ffffffffa067e0df #25 [ffff88171cb43c60] ptlrpc_queue_wait at ffffffffa067e0df [ptlrpc] ffff88171cb43c68: ffff880683401c00 ffff88051feb1f20 ffff88171cb43c78: ffff88171cb43cc8 ffffffffa0917714 #26 [ffff88171cb43c80] mdc_close at ffffffffa0917714 [mdc] ffff88171cb43c88: ffff880683401fa0 ffff88171cb43d60 ffff88171cb43c98: ffff88171cb43ce8 ffff8805d13f6a00 ffff88171cb43ca8: ffff88171cb43d60 ffff88051feb1f20 ffff88171cb43cb8: ffff88086f0b9c00 ffff88051feb1f20 <=== obd_export: 0xffff88086f0b9c00 ffff88171cb43cc8: ffff88171cb43d18 ffffffffa0b9bcb8 #27 [ffff88171cb43cd0] lmv_close at ffffffffa0b9bcb8 [lmv] ffff88171cb43cd8: ffff88106f6d24f8 00000000def49b38 ffff88171cb43ce8: ffff881fdef49b38 ffff880461353c40 ffff88171cb43cf8: ffff88084b729000 ffff881fdef49b38 ffff88171cb43d08: ffff8805d13f6a00 ffff88051feb1f20 ffff88171cb43d18: ffff88171cb43d98 ffffffffa0a80c1e #28 [ffff88171cb43d20] ll_close_inode_openhandle at ffffffffa0a80c1e [lustre] ffff88171cb43d28: ffff881000000010 0000000000000000 ffff88171cb43d38: ffff881fdef49a88 ffff88171cb43e38 ffff88171cb43d48: 000000001cb43dc8 ffff88084b729000 ffff88171cb43d58: ffff880461353c40 0000000000000000 ffff88171cb43d68: ffff88171cb43d98 ffff881fdef49b38 ffff88171cb43d78: ffff881fdef49b18 ffff880461353c40 ffff88171cb43d88: ffff881fdef49ae8 ffff881fdef49b38 ffff88171cb43d98: ffff88171cb43dc8 ffffffffa0a81afa #29 [ffff88171cb43da0] ll_md_real_close at ffffffffa0a81afa [lustre] ffff88171cb43da8: ffff88068c666500 ffff881fdef49b38 ffff88171cb43db8: ffff880351910540 ffff88084b729000 ffff88171cb43dc8: ffff88171cb43e78 ffffffffa0a81d8a #30 [ffff88171cb43dd0] ll_md_close at ffffffffa0a81d8a [lustre] ffff88171cb43dd8: ffff88171cb43e38 ffff88035182b1e8 ffff88171cb43de8: ffff88171cb43f48 ffff88171cb43e08 ffff88171cb43df8: ffff881fdef49b18 000000105182b1e0 ffff88171cb43e08: 0000000000000004 0000000000000000 ffff88171cb43e18: 0000000000000000 0000000000000000 ffff88171cb43e28: 0000000000000000 0000000000000000 ffff88171cb43e38: ffff88171cb43e58 ffffffff8152a796 ffff88171cb43e48: ffff88171cb43e78 ffff881fdef49b38 ffff88171cb43e58: ffff88068c666500 ffff88084b602000 ffff88171cb43e68: ffff881fdef49a40 ffff880351910540 ffff88171cb43e78: ffff88171cb43eb8 ffffffffa0a8233b #31 [ffff88171cb43e80] ll_file_release at ffffffffa0a8233b [lustre] ffff88171cb43e88: 0000000000000004 ffff88068c666500 ffff88171cb43e98: 0000000000000010 ffff881fdef49b38 ffff88171cb43ea8: ffff881e4f874800 ffff88087ab437c0 ffff88171cb43eb8: ffff88171cb43f08 ffffffff8118ad55 #32 [ffff88171cb43ec0] __fput at ffffffff8118ad55 ffff88171cb43ec8: ffff881fdef49b38 ffff881e4f874800 ffff88171cb43ed8: ffff88171cb43ef8 ffff88068c666500 ffff88171cb43ee8: ffff88187b33fa40 0000000000000000 ffff88171cb43ef8: 000000000426c3d0 00000000044a262c ffff88171cb43f08: ffff88171cb43f18 ffffffff8118ae95 #33 [ffff88171cb43f10] fput at ffffffff8118ae95 ffff88171cb43f18: ffff88171cb43f48 ffffffff811861bd #34 [ffff88171cb43f20] filp_close at ffffffff811861bd ffff88171cb43f28: ffff8811b712ee48 ffff88187b33fa40 ffff88171cb43f38: 0000000000000005 ffff88187b33fac0 ffff88171cb43f48: ffff88171cb43f78 ffffffff81186295 #35 [ffff88171cb43f50] sys_close at ffffffff81186295 ffff88171cb43f58: 00000000ffffffff 00000000044a23d0 ffff88171cb43f68: 00000000ffffffff 00000000000004fc ffff88171cb43f78: 0000000000000000 ffffffff8100b072 #36 [ffff88171cb43f80] system_call_fastpath at ffffffff8100b072 RIP: 00002adaacdf26d0 RSP: 00007fff9665e238 RFLAGS: 00010246 RAX: 0000000000000003 RBX: ffffffff8100b072 RCX: 0000000000002261 RDX: 00000000044a24b0 RSI: 0000000000000001 RDI: 0000000000000005 RBP: 0000000000000000 R8: 00002adaad0ac560 R9: 0000000000000001 R10: 00000000000004fd R11: 0000000000000246 R12: 00000000000004fc R13: 00000000ffffffff R14: 00000000044a23d0 R15: 00000000ffffffff ORIG_RAX: 0000000000000003 CS: 0033 SS: 002b crash> mdc_rpc_lock.rpcl_mutex 0xffff88106f600140 rpcl_mutex = { count = { counter = -1 }, wait_lock = { raw_lock = { slock = 1627873543 } }, wait_list = { next = 0xffff8803e5a2fc08, prev = 0xffff8807e0e65be8 }, owner = 0xffff88171cb42000 } crash> mutex_waiter 0xffff8803e5a2fc08 struct mutex_waiter { list = { next = 0xffff8816dda9dc08, prev = 0xffff88106f600148 }, task = 0xffff8807dcd8e040 } crash> task 0xffff8807dcd8e040 |grep PID PID: 4476 TASK: ffff8807dcd8e040 CPU: 22 COMMAND: "code1" <== next crash> mutex_waiter 0xffff8807e0e65be8 struct mutex_waiter { list = { next = 0xffff88106f600148, prev = 0xffff8818520a1be8 }, task = 0xffff88070167ca80 } crash> task 0xffff88070167ca80 |grep PID PID: 19009 TASK: ffff88070167ca80 CPU: 0 COMMAND: "bash" <== prev [400038292103674] [UN] PID: 19009 TASK: ffff88070167ca80 CPU: 0 COMMAND: "bash" [393152828973148] [UN] PID: 5231 TASK: ffff881518308b00 CPU: 2 COMMAND: "code2" [393120112697636] [UN] PID: 4476 TASK: ffff8807dcd8e040 CPU: 22 COMMAND: "code1"