[LU-706] 1.8<->2.1 interop: recovery-small test 51 hung Created: 22/Sep/11 Updated: 28/May/17 Resolved: 28/May/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Jian Yu | Assignee: | WC Triage |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Old Lustre Versions: 1.8.5 and 1.8.6-wc1 New Lustre Version: 2.1.0 RC2 Clean upgrading (Lustre servers and clients were upgraded all at once) from Lustre 1.8.5 and 1.8.6-wc1 to Lustre 2.1.0 RC2 under the following configuration: Test nodes: |
||
| Severity: | 3 |
| Rank (Obsolete): | 10720 |
| Description |
|
After the clean upgrading, running recovery-small test 51 hung, dmesg on the client node client-12 showed that: INFO: task writemany:32634 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. writemany D 0000000000000002 0 32634 32632 0x00000080 ffff8802dc4f9d38 0000000000000082 0000000000000000 02001bcd48000001 ffff880202001bcd ffff8802d7a84238 ffff8802000041ed 00000001002e2106 ffff880326b8b038 ffff8802dc4f9fd8 000000000000f598 ffff880326b8b038 Call Trace: [<ffffffff814dc13e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff8115b102>] ? kmem_cache_alloc+0x182/0x190 [<ffffffff814dbfdb>] mutex_lock+0x2b/0x50 [<ffffffff81182650>] do_filp_open+0x2c0/0xd90 [<ffffffff811af00e>] ? __fsnotify_inode_delete+0xe/0x10 [<ffffffff8118c6aa>] ? __destroy_inode+0x2a/0xa0 [<ffffffff8117f005>] ? putname+0x35/0x50 [<ffffffff8118f0d2>] ? alloc_fd+0x92/0x160 [<ffffffff8116f879>] do_sys_open+0x69/0x140 [<ffffffff8100c635>] ? math_state_restore+0x45/0x60 [<ffffffff8116f990>] sys_open+0x20/0x30 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task writemany:32635 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. writemany D 0000000000000001 0 32635 32632 0x00000080 ffff8802d6e67e08 0000000000000086 0000000000000000 ffffffff812061ff ffff8802d6e67de8 ffffffff8118087d 0000000000000000 00000001002e211b ffff8802dd459ab8 ffff8802d6e67fd8 000000000000f598 ffff8802dd459ab8 Call Trace: [<ffffffff812061ff>] ? security_inode_permission+0x1f/0x30 [<ffffffff8118087d>] ? __link_path_walk+0x11d/0x820 [<ffffffff814dc13e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dbfdb>] mutex_lock+0x2b/0x50 [<ffffffff81183456>] do_unlinkat+0x96/0x1c0 [<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81183596>] sys_unlink+0x16/0x20 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b INFO: task writemany:32637 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. writemany D 0000000000000002 0 32637 32632 0x00000080 ffff8802dc6e3e08 0000000000000086 0000000000000000 ffffffff812061ff ffff8802dc6e3de8 ffffffff8118087d 0000000000000000 00000001002e20c1 ffff8802d0799af8 ffff8802dc6e3fd8 000000000000f598 ffff8802d0799af8 Call Trace: [<ffffffff812061ff>] ? security_inode_permission+0x1f/0x30 [<ffffffff8118087d>] ? __link_path_walk+0x11d/0x820 [<ffffffff814dc13e>] __mutex_lock_slowpath+0x13e/0x180 [<ffffffff814dbfdb>] mutex_lock+0x2b/0x50 [<ffffffff81183456>] do_unlinkat+0x96/0x1c0 [<ffffffff810d1ad2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81183596>] sys_unlink+0x16/0x20 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b LustreError: 32638:0:(client.c:1899:ptlrpc_interrupted_set()) INTERRUPTED SET ffff8802e30e2b40 LustreError: 32638:0:(file.c:158:ll_close_inode_openhandle()) inode 144146779976237058 mdc close failed: rc = -4 INFO: task ptlrpcd-rcv:6723 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ptlrpcd-rcv D 0000000000000001 0 6723 2 0x00000080 ffff8803222cd7f0 0000000000000046 0000000000000000 0000000000000000 ffff8802cdaf92b8 ffff88031b326400 ffff8802cdaf9180 00000001002f14f2 ffff8803222c9078 ffff8803222cdfd8 000000000000f598 ffff8803222c9078 Call Trace: [<ffffffffa02f96fe>] cfs_waitq_wait+0xe/0x10 [libcfs] [<ffffffffa065d140>] seq_client_flush+0x80/0x110 [fid] [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20 [<ffffffffa066bb7c>] mdc_fid_delete+0x1c/0x20 [mdc] [<ffffffffa07b1c4f>] ll_delete_inode+0x12f/0x650 [lustre] [<ffffffffa07b1b20>] ? ll_delete_inode+0x0/0x650 [lustre] [<ffffffff8118d15e>] generic_delete_inode+0xde/0x1d0 [<ffffffff8118d2b5>] generic_drop_inode+0x65/0x80 [<ffffffff8118c122>] iput+0x62/0x70 [<ffffffffa07ce640>] ll_md_blocking_ast+0x270/0x600 [lustre] [<ffffffffa04b5ba3>] ldlm_cancel_callback+0x63/0xf0 [ptlrpc] [<ffffffffa04b5c89>] ldlm_lock_cancel+0x59/0x190 [ptlrpc] [<ffffffffa04d1e91>] ? ldlm_prepare_lru_list+0x281/0x460 [ptlrpc] [<ffffffffa04d2df8>] ldlm_cli_cancel_list_local+0x78/0x230 [ptlrpc] [<ffffffffa0303691>] ? libcfs_debug_vmsg2+0x4d1/0xb50 [libcfs] [<ffffffffa04cf9b0>] ? ldlm_cancel_no_wait_policy+0x0/0xf0 [ptlrpc] [<ffffffffa04d3265>] ldlm_cancel_lru_local+0x35/0x40 [ptlrpc] [<ffffffffa04d3395>] ldlm_replay_locks+0x125/0x7d0 [ptlrpc] [<ffffffffa0520f5a>] ptlrpc_import_recovery_state_machine+0x8ca/0xd20 [ptlrpc] [<ffffffffa04fe19c>] ? lustre_msg_get_last_committed+0x7c/0xe0 [ptlrpc] [<ffffffffa0521ba8>] ptlrpc_connect_interpret+0x5d8/0x1fe0 [ptlrpc] [<ffffffffa04f17ea>] ptlrpc_check_set+0x47a/0x1b20 [ptlrpc] [<ffffffff8107a11b>] ? try_to_del_timer_sync+0x7b/0xe0 [<ffffffff8107a1a2>] ? del_timer_sync+0x22/0x30 [<ffffffffa0525428>] ptlrpcd_check+0x1c8/0x250 [ptlrpc] [<ffffffffa05257cb>] ptlrpcd+0x31b/0x390 [ptlrpc] [<ffffffff8105dc20>] ? default_wake_function+0x0/0x20 [<ffffffffa05254b0>] ? ptlrpcd+0x0/0x390 [ptlrpc] [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffffa05254b0>] ? ptlrpcd+0x0/0x390 [ptlrpc] [<ffffffffa05254b0>] ? ptlrpcd+0x0/0x390 [ptlrpc] [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 Maloo report: https://maloo.whamcloud.com/test_sets/d10f8468-e515-11e0-9909-52540025f9af This test passed while running it after clean upgrading from Lustre Lustre 1.8.5 and 1.8.6-wc1 to Lustre 2.1.0 RC1: |
| Comments |
| Comment by Andreas Dilger [ 28/May/17 ] |
|
Close old issue. |