[LU-9146] Backport patches from upstream to resolve deadlock in xattr Created: 23/Feb/17 Updated: 16/Jan/19 Resolved: 09/Mar/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Yang Sheng | Assignee: | Yang Sheng |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
We need backport below patches from upstream to resolve deadlock for i_data_sem. From a521100231f816f8cdd9c8e77da14ff1e42c2b17 Mon Sep 17 00:00:00 2001 Instead of initializing the allocation_request structure in This allows ext4_ind_map_blocks to pass flags in the allocation Signed-off-by: Theodore Ts'o <tytso@mit.edu> From e3cf5d5d9a86df1c5e413bdd3725c25a16ff854c Mon Sep 17 00:00:00 2001 The EXT4_STATE_DELALLOC_RESERVED flag was originally implemented Since then, we have mb_flags passed down through most of the code This commit plumbs in the last of what is required, and then adds a Signed-off-by: Theodore Ts'o <tytso@mit.edu> From 2e81a4eeedcaa66e35f58b81e0755b87057ce392 Mon Sep 17 00:00:00 2001 When we need to move xattrs into external xattr block, we call Protect from recursion using EXT4_STATE_NO_EXPAND inode flag and move CC: stable@vger.kernel.org # 4.4.x |
| Comments |
| Comment by Gerrit Updater [ 23/Feb/17 ] |
|
Yang Sheng (yang.sheng@intel.com) uploaded a new patch: https://review.whamcloud.com/25595 |
| Comment by Gerrit Updater [ 09/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25595/ |
| Comment by Peter Jones [ 09/Mar/17 ] |
|
Landed for 2.10 |
| Comment by Oleg Drokin [ 09/Mar/17 ] |
|
We can do it, but since we always patch ldiskfs ourselves anyway, it has no imact on our patchlessness. |
| Comment by Yang Sheng [ 12/May/17 ] |
|
Just for record. Feb 22 12:41:41 gio12 kernel: Lustre: Skipped 686202 previous similar messages Feb 22 12:41:50 gio12 kernel: Lustre: dtemp-OST001b: recovery is timed out, evict stale exports Feb 22 12:41:50 gio12 kernel: Lustre: dtemp-OST001b: disconnecting 1 stale clients Feb 22 12:41:50 gio12 kernel: Lustre: dtemp-OST001b: Client a83807d9-ca3b-9fd3-3cbc-1d2b648b12d1 (at 172.22.160.62@o2ib6) reconnecting Feb 22 12:41:50 gio12 kernel: Lustre: Skipped 894 previous similar messages Feb 22 12:41:52 gio12 kernel: Lustre: dtemp-OST001b: Recovery over after 14:56, of 1435 clients 1434 recovered and 1 was evicted. Feb 22 12:41:52 gio12 kernel: Lustre: Skipped 1 previous similar message Feb 22 12:41:52 gio12 kernel: Lustre: dtemp-OST001b: deleting orphan objects from 0x0:7139820 to 0x0:7139873 Feb 22 12:44:16 gio12 kernel: INFO: task ll_ost_io01_002:14056 blocked for more than 120 seconds. Feb 22 12:44:16 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:44:16 gio12 kernel: ll_ost_io01_002 D 0000000000000000 0 14056 2 0x00000080 Feb 22 12:44:16 gio12 kernel: ffff881020b9b898 0000000000000046 ffff88102151b980 ffff881020b9bfd8 Feb 22 12:44:16 gio12 kernel: ffff881020b9bfd8 ffff881020b9bfd8 ffff88102151b980 ffff88102151b980 Feb 22 12:44:16 gio12 kernel: ffff8807a92bda90 fffffffeffffffff ffff8807a92bda98 0000000000000000 Feb 22 12:44:16 gio12 kernel: Call Trace: Feb 22 12:44:16 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:44:16 gio12 kernel: [<ffffffff8163d8d5>] rwsem_down_read_failed+0xf5/0x170 Feb 22 12:44:16 gio12 kernel: [<ffffffff81301e54>] call_rwsem_down_read_failed+0x14/0x30 Feb 22 12:44:16 gio12 kernel: [<ffffffff8163b130>] ? down_read+0x20/0x30 Feb 22 12:44:16 gio12 kernel: [<ffffffffa0def84b>] ldiskfs_xattr_block_set+0x62b/0xa80 [ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0df09d4>] ldiskfs_expand_extra_isize_ea+0x404/0x810 [ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0df6d9f>] ldiskfs_mark_inode_dirty+0x1af/0x210 [ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0de0884>] ldiskfs_ext_truncate+0x24/0xe0 [ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0df83b7>] ldiskfs_truncate+0x3b7/0x3f0 [ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0e92e08>] osd_punch+0x138/0x5e0 [osd_ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0c84346>] ofd_object_punch+0x6e6/0xc30 [ofd] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0c715e6>] ofd_punch_hdl+0x466/0x720 [ofd] Feb 22 12:44:16 gio12 kernel: [<ffffffffa109bc9b>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc] Feb 22 12:44:16 gio12 kernel: [<ffffffffa103ea3b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0815cf8>] ? lc_watchdog_touch+0x68/0x180 [libcfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa103bb08>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc] Feb 22 12:44:16 gio12 kernel: [<ffffffff8163e05b>] ? _raw_spin_unlock_irqrestore+0x1b/0x40 Feb 22 12:44:16 gio12 kernel: [<ffffffffa1042360>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] Feb 22 12:44:16 gio12 kernel: [<ffffffffa1041760>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc] Feb 22 12:44:16 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:44:16 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:44:16 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:44:16 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:44:16 gio12 kernel: INFO: task jbd2/dm-11-8:15759 blocked for more than 120 seconds. Feb 22 12:44:16 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:44:16 gio12 kernel: jbd2/dm-11-8 D ffff880036446800 0 15759 2 0x00000080 Feb 22 12:44:16 gio12 kernel: ffff880fdd21bc88 0000000000000046 ffff88104f747300 ffff880fdd21bfd8 Feb 22 12:44:16 gio12 kernel: ffff880fdd21bfd8 ffff880fdd21bfd8 ffff88104f747300 ffff880fdd21bda0 Feb 22 12:44:16 gio12 kernel: ffff881016e128c0 ffff88104f747300 ffff880fdd21bd88 ffff880036446800 Feb 22 12:44:16 gio12 kernel: Call Trace: Feb 22 12:44:16 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:44:16 gio12 kernel: [<ffffffffa018d138>] jbd2_journal_commit_transaction+0x248/0x19e0 [jbd2] Feb 22 12:44:16 gio12 kernel: [<ffffffff810c15fc>] ? update_curr+0xcc/0x150 Feb 22 12:44:16 gio12 kernel: [<ffffffff810c1ac6>] ? dequeue_entity+0x106/0x520 Feb 22 12:44:16 gio12 kernel: [<ffffffff81013588>] ? __switch_to+0xf8/0x4b0 Feb 22 12:44:16 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:44:16 gio12 kernel: [<ffffffff8108d7be>] ? try_to_del_timer_sync+0x5e/0x90 Feb 22 12:44:16 gio12 kernel: [<ffffffffa0192e99>] kjournald2+0xc9/0x260 [jbd2] Feb 22 12:44:16 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:44:16 gio12 kernel: [<ffffffffa0192dd0>] ? commit_timeout+0x10/0x10 [jbd2] Feb 22 12:44:16 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:44:16 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:44:16 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:44:16 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:44:16 gio12 kernel: INFO: task kworker/u33:2:28976 blocked for more than 120 seconds. Feb 22 12:44:16 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:44:16 gio12 kernel: kworker/u33:2 D ffff880850ea8030 0 28976 2 0x00000080 Feb 22 12:44:16 gio12 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:11) Feb 22 12:44:16 gio12 kernel: ffff880462d2f8e8 0000000000000046 ffff88084f0a8b80 ffff880462d2ffd8 Feb 22 12:44:16 gio12 kernel: ffff880462d2ffd8 ffff880462d2ffd8 ffff88084f0a8b80 ffff881016e12800 Feb 22 12:44:16 gio12 kernel: ffff881016e12878 000000000d83b523 ffff880036446800 ffff880850ea8030 Feb 22 12:44:16 gio12 kernel: Call Trace: Feb 22 12:44:16 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:44:16 gio12 kernel: [<ffffffffa018a085>] wait_transaction_locked+0x85/0xd0 [jbd2] Feb 22 12:44:16 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:44:16 gio12 kernel: [<ffffffffa018a400>] start_this_handle+0x2b0/0x5d0 [jbd2] Feb 22 12:44:16 gio12 kernel: [<ffffffff811c176a>] ? kmem_cache_alloc+0x1ba/0x1d0 Feb 22 12:44:16 gio12 kernel: [<ffffffffa018a933>] jbd2__journal_start+0xf3/0x1e0 [jbd2] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0df7254>] ? ldiskfs_writepages+0x454/0xd80 [ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0dd6829>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffffa0df7254>] ldiskfs_writepages+0x454/0xd80 [ldiskfs] Feb 22 12:44:16 gio12 kernel: [<ffffffff81174d08>] ? generic_writepages+0x58/0x80 Feb 22 12:44:17 gio12 kernel: [<ffffffff81175dae>] do_writepages+0x1e/0x40 Feb 22 12:44:17 gio12 kernel: [<ffffffff81208c90>] __writeback_single_inode+0x40/0x220 Feb 22 12:44:17 gio12 kernel: [<ffffffff812096fe>] writeback_sb_inodes+0x25e/0x420 Feb 22 12:44:17 gio12 kernel: [<ffffffff8120995f>] __writeback_inodes_wb+0x9f/0xd0 Feb 22 12:44:17 gio12 kernel: [<ffffffff8120a1a3>] wb_writeback+0x263/0x2f0 Feb 22 12:44:17 gio12 kernel: [<ffffffff811f8fac>] ? get_nr_inodes+0x4c/0x70 Feb 22 12:44:17 gio12 kernel: [<ffffffff8120c42b>] bdi_writeback_workfn+0x2cb/0x460 Feb 22 12:44:17 gio12 kernel: [<ffffffff8109d6bb>] process_one_work+0x17b/0x470 Feb 22 12:44:17 gio12 kernel: [<ffffffff8109e48b>] worker_thread+0x11b/0x400 Feb 22 12:44:17 gio12 kernel: [<ffffffff8109e370>] ? rescuer_thread+0x400/0x400 Feb 22 12:44:17 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:44:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:44:17 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:44:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:44:17 gio12 kernel: INFO: task ll_ost_io03_004:32100 blocked for more than 120 seconds. Feb 22 12:44:17 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:44:17 gio12 kernel: ll_ost_io03_004 D ffff881053a24060 0 32100 2 0x00000080 Feb 22 12:44:17 gio12 kernel: ffff88031eb6f9f0 0000000000000046 ffff8808a62c2280 ffff88031eb6ffd8 Feb 22 12:44:17 gio12 kernel: ffff88031eb6ffd8 ffff88031eb6ffd8 ffff8808a62c2280 ffff881016e12800 Feb 22 12:44:17 gio12 kernel: ffff881016e12878 000000000d83b523 ffff880036446800 ffff881053a24060 Feb 22 12:44:17 gio12 kernel: Call Trace: Feb 22 12:44:17 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:44:17 gio12 kernel: [<ffffffffa018a085>] wait_transaction_locked+0x85/0xd0 [jbd2] Feb 22 12:44:17 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:44:17 gio12 kernel: [<ffffffffa018a400>] start_this_handle+0x2b0/0x5d0 [jbd2] Feb 22 12:44:17 gio12 kernel: [<ffffffffa0e713d4>] ? osd_declare_xattr_set+0xe4/0x2e0 [osd_ldiskfs] Feb 22 12:44:17 gio12 kernel: [<ffffffff811c176a>] ? kmem_cache_alloc+0x1ba/0x1d0 Feb 22 12:44:17 gio12 kernel: [<ffffffffa018a933>] jbd2__journal_start+0xf3/0x1e0 [jbd2] Feb 22 12:44:17 gio12 kernel: [<ffffffffa0e78534>] ? osd_trans_start+0x174/0x410 [osd_ldiskfs] Feb 22 12:44:17 gio12 kernel: [<ffffffffa0dd6829>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] Feb 22 12:44:17 gio12 kernel: [<ffffffffa0e78534>] osd_trans_start+0x174/0x410 [osd_ldiskfs] Feb 22 12:44:17 gio12 kernel: [<ffffffffa0c80d7b>] ofd_trans_start+0x6b/0xe0 [ofd] Feb 22 12:44:17 gio12 kernel: [<ffffffffa0c8428a>] ofd_object_punch+0x62a/0xc30 [ofd] Feb 22 12:44:17 gio12 kernel: [<ffffffffa0c715e6>] ofd_punch_hdl+0x466/0x720 [ofd] Feb 22 12:44:17 gio12 kernel: [<ffffffffa109bc9b>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc] Feb 22 12:44:17 gio12 kernel: [<ffffffffa103ea3b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] Feb 22 12:44:17 gio12 kernel: [<ffffffffa0815cf8>] ? lc_watchdog_touch+0x68/0x180 [libcfs] Feb 22 12:44:17 gio12 kernel: [<ffffffffa103bb08>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc] Feb 22 12:44:17 gio12 kernel: [<ffffffff810af0e8>] ? __wake_up_common+0x58/0x90 Feb 22 12:44:17 gio12 kernel: [<ffffffffa1042360>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] Feb 22 12:44:17 gio12 kernel: [<ffffffffa1041760>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc] Feb 22 12:44:17 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:44:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:44:17 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:44:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:44:39 gio12 kernel: LustreError: 137-5: dtemp-OST001c_UUID: not available for connect from 172.22.166.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Feb 22 12:44:39 gio12 kernel: LustreError: Skipped 1679 previous similar messages ... Feb 22 12:45:42 gio12 kernel: LustreError: 137-5: dtemp-OST001c_UUID: not available for connect from 172.22.166.11@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Feb 22 12:46:17 gio12 kernel: INFO: task ll_ost_io01_002:14056 blocked for more than 120 seconds. Feb 22 12:46:17 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:46:17 gio12 kernel: ll_ost_io01_002 D 0000000000000000 0 14056 2 0x00000080 Feb 22 12:46:17 gio12 kernel: ffff881020b9b898 0000000000000046 ffff88102151b980 ffff881020b9bfd8 Feb 22 12:46:17 gio12 kernel: ffff881020b9bfd8 ffff881020b9bfd8 ffff88102151b980 ffff88102151b980 Feb 22 12:46:17 gio12 kernel: ffff8807a92bda90 fffffffeffffffff ffff8807a92bda98 0000000000000000 Feb 22 12:46:17 gio12 kernel: Call Trace: Feb 22 12:46:17 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:46:17 gio12 kernel: [<ffffffff8163d8d5>] rwsem_down_read_failed+0xf5/0x170 Feb 22 12:46:17 gio12 kernel: [<ffffffff81301e54>] call_rwsem_down_read_failed+0x14/0x30 Feb 22 12:46:17 gio12 kernel: [<ffffffff8163b130>] ? down_read+0x20/0x30 Feb 22 12:46:17 gio12 kernel: [<ffffffffa0def84b>] ldiskfs_xattr_block_set+0x62b/0xa80 [ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0df09d4>] ldiskfs_expand_extra_isize_ea+0x404/0x810 [ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0df6d9f>] ldiskfs_mark_inode_dirty+0x1af/0x210 [ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0de0884>] ldiskfs_ext_truncate+0x24/0xe0 [ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0df83b7>] ldiskfs_truncate+0x3b7/0x3f0 [ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0e92e08>] osd_punch+0x138/0x5e0 [osd_ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0c84346>] ofd_object_punch+0x6e6/0xc30 [ofd] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0c715e6>] ofd_punch_hdl+0x466/0x720 [ofd] Feb 22 12:46:17 gio12 kernel: [<ffffffffa109bc9b>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc] Feb 22 12:46:17 gio12 kernel: [<ffffffffa103ea3b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0815cf8>] ? lc_watchdog_touch+0x68/0x180 [libcfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa103bb08>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc] Feb 22 12:46:17 gio12 kernel: [<ffffffff8163e05b>] ? _raw_spin_unlock_irqrestore+0x1b/0x40 Feb 22 12:46:17 gio12 kernel: [<ffffffffa1042360>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] Feb 22 12:46:17 gio12 kernel: [<ffffffffa1041760>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc] Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:46:17 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:46:17 gio12 kernel: INFO: task jbd2/dm-11-8:15759 blocked for more than 120 seconds. Feb 22 12:46:17 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:46:17 gio12 kernel: jbd2/dm-11-8 D ffff880036446800 0 15759 2 0x00000080 Feb 22 12:46:17 gio12 kernel: ffff880fdd21bc88 0000000000000046 ffff88104f747300 ffff880fdd21bfd8 Feb 22 12:46:17 gio12 kernel: ffff880fdd21bfd8 ffff880fdd21bfd8 ffff88104f747300 ffff880fdd21bda0 Feb 22 12:46:17 gio12 kernel: ffff881016e128c0 ffff88104f747300 ffff880fdd21bd88 ffff880036446800 Feb 22 12:46:17 gio12 kernel: Call Trace: Feb 22 12:46:17 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:46:17 gio12 kernel: [<ffffffffa018d138>] jbd2_journal_commit_transaction+0x248/0x19e0 [jbd2] Feb 22 12:46:17 gio12 kernel: [<ffffffff810c15fc>] ? update_curr+0xcc/0x150 Feb 22 12:46:17 gio12 kernel: [<ffffffff810c1ac6>] ? dequeue_entity+0x106/0x520 Feb 22 12:46:17 gio12 kernel: [<ffffffff81013588>] ? __switch_to+0xf8/0x4b0 Feb 22 12:46:17 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:46:17 gio12 kernel: [<ffffffff8108d7be>] ? try_to_del_timer_sync+0x5e/0x90 Feb 22 12:46:17 gio12 kernel: [<ffffffffa0192e99>] kjournald2+0xc9/0x260 [jbd2] Feb 22 12:46:17 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:46:17 gio12 kernel: [<ffffffffa0192dd0>] ? commit_timeout+0x10/0x10 [jbd2] Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:46:17 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:46:17 gio12 kernel: INFO: task kworker/u33:2:28976 blocked for more than 120 seconds. Feb 22 12:46:17 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:46:17 gio12 kernel: kworker/u33:2 D ffff880850ea8030 0 28976 2 0x00000080 Feb 22 12:46:17 gio12 kernel: Workqueue: writeback bdi_writeback_workfn (flush-253:11) Feb 22 12:46:17 gio12 kernel: ffff880462d2f8e8 0000000000000046 ffff88084f0a8b80 ffff880462d2ffd8 Feb 22 12:46:17 gio12 kernel: ffff880462d2ffd8 ffff880462d2ffd8 ffff88084f0a8b80 ffff881016e12800 Feb 22 12:46:17 gio12 kernel: ffff881016e12878 000000000d83b523 ffff880036446800 ffff880850ea8030 Feb 22 12:46:17 gio12 kernel: Call Trace: Feb 22 12:46:17 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:46:17 gio12 kernel: [<ffffffffa018a085>] wait_transaction_locked+0x85/0xd0 [jbd2] Feb 22 12:46:17 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:46:17 gio12 kernel: [<ffffffffa018a400>] start_this_handle+0x2b0/0x5d0 [jbd2] Feb 22 12:46:17 gio12 kernel: [<ffffffff811c176a>] ? kmem_cache_alloc+0x1ba/0x1d0 Feb 22 12:46:17 gio12 kernel: [<ffffffffa018a933>] jbd2__journal_start+0xf3/0x1e0 [jbd2] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0df7254>] ? ldiskfs_writepages+0x454/0xd80 [ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0dd6829>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffffa0df7254>] ldiskfs_writepages+0x454/0xd80 [ldiskfs] Feb 22 12:46:17 gio12 kernel: [<ffffffff81174d08>] ? generic_writepages+0x58/0x80 Feb 22 12:46:17 gio12 kernel: [<ffffffff81175dae>] do_writepages+0x1e/0x40 Feb 22 12:46:17 gio12 kernel: [<ffffffff81208c90>] __writeback_single_inode+0x40/0x220 Feb 22 12:46:17 gio12 kernel: [<ffffffff812096fe>] writeback_sb_inodes+0x25e/0x420 Feb 22 12:46:17 gio12 kernel: [<ffffffff8120995f>] __writeback_inodes_wb+0x9f/0xd0 Feb 22 12:46:17 gio12 kernel: [<ffffffff8120a1a3>] wb_writeback+0x263/0x2f0 Feb 22 12:46:17 gio12 kernel: [<ffffffff811f8fac>] ? get_nr_inodes+0x4c/0x70 Feb 22 12:46:17 gio12 kernel: [<ffffffff8120c42b>] bdi_writeback_workfn+0x2cb/0x460 Feb 22 12:46:17 gio12 kernel: [<ffffffff8109d6bb>] process_one_work+0x17b/0x470 Feb 22 12:46:17 gio12 kernel: [<ffffffff8109e48b>] worker_thread+0x11b/0x400 Feb 22 12:46:17 gio12 kernel: [<ffffffff8109e370>] ? rescuer_thread+0x400/0x400 Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:46:17 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:46:17 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:46:17 gio12 kernel: INFO: task ll_ost_io03_004:32100 blocked for more than 120 seconds. Feb 22 12:46:17 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:46:17 gio12 kernel: ll_ost_io03_004 D ffff881053a24060 0 32100 2 0x00000080 Feb 22 12:46:17 gio12 kernel: ffff88031eb6f9f0 0000000000000046 ffff8808a62c2280 ffff88031eb6ffd8 Feb 22 12:46:17 gio12 kernel: ffff88031eb6ffd8 ffff88031eb6ffd8 ffff8808a62c2280 ffff881016e12800 Feb 22 12:46:17 gio12 kernel: ffff881016e12878 000000000d83b523 ffff880036446800 ffff881053a24060 Feb 22 12:46:17 gio12 kernel: Call Trace: Feb 22 12:46:17 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:46:17 gio12 kernel: [<ffffffffa018a085>] wait_transaction_locked+0x85/0xd0 [jbd2] Feb 22 12:46:18 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:46:18 gio12 kernel: [<ffffffffa018a400>] start_this_handle+0x2b0/0x5d0 [jbd2] Feb 22 12:46:18 gio12 kernel: [<ffffffffa0e713d4>] ? osd_declare_xattr_set+0xe4/0x2e0 [osd_ldiskfs] Feb 22 12:46:18 gio12 kernel: [<ffffffff811c176a>] ? kmem_cache_alloc+0x1ba/0x1d0 Feb 22 12:46:18 gio12 kernel: [<ffffffffa018a933>] jbd2__journal_start+0xf3/0x1e0 [jbd2] Feb 22 12:46:18 gio12 kernel: [<ffffffffa0e78534>] ? osd_trans_start+0x174/0x410 [osd_ldiskfs] Feb 22 12:46:18 gio12 kernel: [<ffffffffa0dd6829>] __ldiskfs_journal_start_sb+0x69/0xe0 [ldiskfs] Feb 22 12:46:18 gio12 kernel: [<ffffffffa0e78534>] osd_trans_start+0x174/0x410 [osd_ldiskfs] Feb 22 12:46:18 gio12 kernel: [<ffffffffa0c80d7b>] ofd_trans_start+0x6b/0xe0 [ofd] Feb 22 12:46:18 gio12 kernel: [<ffffffffa0c8428a>] ofd_object_punch+0x62a/0xc30 [ofd] Feb 22 12:46:18 gio12 kernel: [<ffffffffa0c715e6>] ofd_punch_hdl+0x466/0x720 [ofd] Feb 22 12:46:18 gio12 kernel: [<ffffffffa109bc9b>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc] Feb 22 12:46:18 gio12 kernel: [<ffffffffa103ea3b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] Feb 22 12:46:18 gio12 kernel: [<ffffffffa0815cf8>] ? lc_watchdog_touch+0x68/0x180 [libcfs] Feb 22 12:46:18 gio12 kernel: [<ffffffffa103bb08>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc] Feb 22 12:46:18 gio12 kernel: [<ffffffff810af0e8>] ? __wake_up_common+0x58/0x90 Feb 22 12:46:18 gio12 kernel: [<ffffffffa1042360>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] Feb 22 12:46:18 gio12 kernel: [<ffffffffa1041760>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc] Feb 22 12:46:18 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:46:18 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:46:18 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:46:18 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:47:09 gio12 kernel: LustreError: 137-5: dtemp-OST001c_UUID: not available for connect from 172.22.166.12@o2ib6 (no target). If you are running an HA pair check that the target is mounted on the other server. Feb 22 12:48:18 gio12 kernel: INFO: task ll_ost_io01_002:14056 blocked for more than 120 seconds. Feb 22 12:48:18 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:48:18 gio12 kernel: ll_ost_io01_002 D 0000000000000000 0 14056 2 0x00000080 Feb 22 12:48:18 gio12 kernel: ffff881020b9b898 0000000000000046 ffff88102151b980 ffff881020b9bfd8 Feb 22 12:48:18 gio12 kernel: ffff881020b9bfd8 ffff881020b9bfd8 ffff88102151b980 ffff88102151b980 Feb 22 12:48:18 gio12 kernel: ffff8807a92bda90 fffffffeffffffff ffff8807a92bda98 0000000000000000 Feb 22 12:48:18 gio12 kernel: Call Trace: Feb 22 12:48:18 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:48:18 gio12 kernel: [<ffffffff8163d8d5>] rwsem_down_read_failed+0xf5/0x170 Feb 22 12:48:18 gio12 kernel: [<ffffffff81301e54>] call_rwsem_down_read_failed+0x14/0x30 Feb 22 12:48:18 gio12 kernel: [<ffffffff8163b130>] ? down_read+0x20/0x30 Feb 22 12:48:18 gio12 kernel: [<ffffffffa0def84b>] ldiskfs_xattr_block_set+0x62b/0xa80 [ldiskfs] Feb 22 12:48:18 gio12 kernel: [<ffffffffa0df09d4>] ldiskfs_expand_extra_isize_ea+0x404/0x810 [ldiskfs] Feb 22 12:48:18 gio12 kernel: [<ffffffffa0df6d9f>] ldiskfs_mark_inode_dirty+0x1af/0x210 [ldiskfs] Feb 22 12:48:18 gio12 kernel: [<ffffffffa0de0884>] ldiskfs_ext_truncate+0x24/0xe0 [ldiskfs] Feb 22 12:48:18 gio12 kernel: [<ffffffffa0df83b7>] ldiskfs_truncate+0x3b7/0x3f0 [ldiskfs] Feb 22 12:48:18 gio12 kernel: [<ffffffffa0e92e08>] osd_punch+0x138/0x5e0 [osd_ldiskfs] Feb 22 12:48:18 gio12 kernel: [<ffffffffa0c84346>] ofd_object_punch+0x6e6/0xc30 [ofd] Feb 22 12:48:18 gio12 kernel: [<ffffffffa0c715e6>] ofd_punch_hdl+0x466/0x720 [ofd] Feb 22 12:48:18 gio12 kernel: [<ffffffffa109bc9b>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc] Feb 22 12:48:18 gio12 kernel: [<ffffffffa103ea3b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] Feb 22 12:48:18 gio12 kernel: [<ffffffffa0815cf8>] ? lc_watchdog_touch+0x68/0x180 [libcfs] Feb 22 12:48:18 gio12 kernel: [<ffffffffa103bb08>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc] Feb 22 12:48:18 gio12 kernel: [<ffffffff8163e05b>] ? _raw_spin_unlock_irqrestore+0x1b/0x40 Feb 22 12:48:18 gio12 kernel: [<ffffffffa1042360>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] Feb 22 12:48:18 gio12 kernel: [<ffffffffa1041760>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc] Feb 22 12:48:18 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:48:18 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:48:18 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:48:18 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:48:18 gio12 kernel: INFO: task jbd2/dm-11-8:15759 blocked for more than 120 seconds. Feb 22 12:48:18 gio12 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 12:48:18 gio12 kernel: jbd2/dm-11-8 D ffff880036446800 0 15759 2 0x00000080 Feb 22 12:48:18 gio12 kernel: ffff880fdd21bc88 0000000000000046 ffff88104f747300 ffff880fdd21bfd8 Feb 22 12:48:18 gio12 kernel: ffff880fdd21bfd8 ffff880fdd21bfd8 ffff88104f747300 ffff880fdd21bda0 Feb 22 12:48:18 gio12 kernel: ffff881016e128c0 ffff88104f747300 ffff880fdd21bd88 ffff880036446800 Feb 22 12:48:18 gio12 kernel: Call Trace: Feb 22 12:48:18 gio12 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 12:48:18 gio12 kernel: [<ffffffffa018d138>] jbd2_journal_commit_transaction+0x248/0x19e0 [jbd2] Feb 22 12:48:18 gio12 kernel: [<ffffffff810c15fc>] ? update_curr+0xcc/0x150 Feb 22 12:48:18 gio12 kernel: [<ffffffff810c1ac6>] ? dequeue_entity+0x106/0x520 Feb 22 12:48:18 gio12 kernel: [<ffffffff81013588>] ? __switch_to+0xf8/0x4b0 Feb 22 12:48:18 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:48:18 gio12 kernel: [<ffffffff8108d7be>] ? try_to_del_timer_sync+0x5e/0x90 Feb 22 12:48:18 gio12 kernel: [<ffffffffa0192e99>] kjournald2+0xc9/0x260 [jbd2] Feb 22 12:48:18 gio12 kernel: [<ffffffff810a6ba0>] ? wake_up_atomic_t+0x30/0x30 Feb 22 12:48:18 gio12 kernel: [<ffffffffa0192dd0>] ? commit_timeout+0x10/0x10 [jbd2] Feb 22 12:48:18 gio12 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 12:48:18 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 12:48:18 gio12 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 12:48:18 gio12 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 MDS also had watchdog stack traces occur for several mdt tasks. MDT watchdog traces were triggered once but OSS has repeated watchdog traces. Example MDT stack trace Feb 22 03:52:03 gio0 kernel: INFO: task mdt01_003:9154 blocked for more than 120 seconds. Feb 22 03:52:03 gio0 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Feb 22 03:52:03 gio0 kernel: mdt01_003 D ffff880838cab1d8 0 9154 2 0x00000080 Feb 22 03:52:03 gio0 kernel: ffff8810371ab4c8 0000000000000046 ffff8810507eb980 ffff8810371abfd8 Feb 22 03:52:03 gio0 kernel: ffff8810371abfd8 ffff8810371abfd8 ffff8810507eb980 ffff8810507eb980 Feb 22 03:52:03 gio0 kernel: ffff880838cab1c8 ffff880838cab1d0 ffffffff00000000 ffff880838cab1d8 Feb 22 03:52:03 gio0 kernel: Call Trace: Feb 22 03:52:03 gio0 kernel: [<ffffffff8163bf19>] schedule+0x29/0x70 Feb 22 03:52:03 gio0 kernel: [<ffffffff8163d6d5>] rwsem_down_write_failed+0x115/0x220 Feb 22 03:52:03 gio0 kernel: [<ffffffff812134dc>] ? __find_get_block+0xbc/0x120 Feb 22 03:52:03 gio0 kernel: [<ffffffff81301e83>] call_rwsem_down_write_failed+0x13/0x20 Feb 22 03:52:03 gio0 kernel: [<ffffffff8163b16d>] ? down_write+0x2d/0x30 Feb 22 03:52:03 gio0 kernel: [<ffffffffa142bba7>] lod_alloc_qos.constprop.15+0x187/0x1400 [lod] Feb 22 03:52:03 gio0 kernel: [<ffffffff8121292d>] ? __brelse+0x3d/0x50 Feb 22 03:52:03 gio0 kernel: [<ffffffffa10ed65f>] ? ldiskfs_xattr_ibody_get+0xef/0x1a0 [ldiskfs] Feb 22 03:52:03 gio0 kernel: [<ffffffffa10ec6af>] ? ldiskfs_xattr_find_entry+0x9f/0x130 [ldiskfs] Feb 22 03:52:03 gio0 kernel: [<ffffffffa142e7fd>] lod_qos_prep_create+0x10cd/0x1fbc [lod] Feb 22 03:52:03 gio0 kernel: [<ffffffffa11aaac9>] ? osd_declare_qid+0x279/0x4b0 [osd_ldiskfs] Feb 22 03:52:03 gio0 kernel: [<ffffffffa11aaeae>] ? osd_declare_inode_qid+0x1ae/0x290 [osd_ldiskfs] Feb 22 03:52:03 gio0 kernel: [<ffffffffa1427fdd>] lod_declare_striped_object+0x1fd/0x810 [lod] Feb 22 03:52:03 gio0 kernel: [<ffffffffa1171e23>] ? osd_declare_object_create+0x113/0x2b0 [osd_ldiskfs] Feb 22 03:52:03 gio0 kernel: [<ffffffffa1429661>] lod_declare_object_create+0x231/0x4b0 [lod] Feb 22 03:52:03 gio0 kernel: [<ffffffffa14836af>] mdd_declare_object_create_internal+0xdf/0x2f0 [mdd] Feb 22 03:52:03 gio0 kernel: [<ffffffffa1478038>] mdd_declare_create+0x48/0xef0 [mdd] Feb 22 03:52:03 gio0 kernel: [<ffffffffa1479669>] mdd_create+0x789/0x12a0 [mdd] Feb 22 03:52:03 gio0 kernel: [<ffffffffa134ed52>] mdt_reint_open+0x1f92/0x2e00 [mdt] Feb 22 03:52:03 gio0 kernel: [<ffffffffa08aa1a9>] ? upcall_cache_get_entry+0x3e9/0x8e0 [libcfs] Feb 22 03:52:03 gio0 kernel: [<ffffffff812fc212>] ? strlcpy+0x42/0x60 Feb 22 03:52:03 gio0 kernel: [<ffffffffa1341e30>] mdt_reint_rec+0x80/0x210 [mdt] Feb 22 03:52:03 gio0 kernel: [<ffffffffa1322921>] mdt_reint_internal+0x5e1/0xb30 [mdt] Feb 22 03:52:03 gio0 kernel: [<ffffffffa1322fd2>] mdt_intent_reint+0x162/0x420 [mdt] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0c95797>] ? lustre_msg_buf+0x17/0x60 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa13268b5>] mdt_intent_opc+0x215/0xa30 [mdt] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0c99e30>] ? lustre_swab_ldlm_policy_data+0x30/0x30 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa132e478>] mdt_intent_policy+0x138/0x320 [mdt] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0c491d7>] ldlm_lock_enqueue+0x357/0x9c0 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0c6f7b2>] ldlm_handle_enqueue0+0x4f2/0x16f0 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0c99eb0>] ? lustre_swab_ldlm_lock_desc+0x30/0x30 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0cfcc32>] tgt_enqueue+0x62/0x210 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0d01c9b>] tgt_request_handle+0x8fb/0x11f0 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0ca4a3b>] ptlrpc_server_handle_request+0x21b/0xa90 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa089ecf8>] ? lc_watchdog_touch+0x68/0x180 [libcfs] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0ca1b08>] ? ptlrpc_wait_event+0x98/0x330 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffff810af0e8>] ? __wake_up_common+0x58/0x90 Feb 22 03:52:03 gio0 kernel: [<ffffffffa0ca8360>] ptlrpc_main+0xc00/0x1f60 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffffa0ca7760>] ? ptlrpc_register_service+0x1070/0x1070 [ptlrpc] Feb 22 03:52:03 gio0 kernel: [<ffffffff810a5baf>] kthread+0xcf/0xe0 Feb 22 03:52:03 gio0 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 Feb 22 03:52:03 gio0 kernel: [<ffffffff81646e58>] ret_from_fork+0x58/0x90 Feb 22 03:52:03 gio0 kernel: [<ffffffff810a5ae0>] ? kthread_create_on_node+0x140/0x140 |