Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.4.0
-
3
-
6293
Description
It seems a deadlock in rename was introduced this Friday 25th.
My racer runs are hittign it reliably.
[77051.148056] LNet: Service thread pid 6227 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [77051.148872] Pid: 6227, comm: mdt01_008 [77051.149108] [77051.149108] Call Trace: [77051.149529] [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30 [77051.149810] [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0 [77051.150071] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10 [77051.150820] [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs] [77051.151116] [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc] [77051.151398] [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc] [77051.152764] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [77051.153306] [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc] [77051.153618] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77051.153917] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77051.154214] [<ffffffffa080984a>] mdt_reint_rename+0x7ba/0x1a70 [mdt] [77051.154580] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77051.154895] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77051.155220] [<ffffffffa0e29700>] ? lu_ucred+0x20/0x30 [obdclass] [77051.156752] [<ffffffffa07e7ff5>] ? mdt_ucred+0x15/0x20 [mdt] [77051.157068] [<ffffffffa080229c>] ? mdt_root_squash+0x2c/0x410 [mdt] [77051.157515] [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc] [77051.157839] [<ffffffffa0e29700>] ? lu_ucred+0x20/0x30 [obdclass] [77051.158785] [<ffffffffa0806a51>] mdt_reint_rec+0x41/0xe0 [mdt] [77051.159066] [<ffffffffa08000f3>] mdt_reint_internal+0x4f3/0x7e0 [mdt] [77051.159358] [<ffffffffa0800424>] mdt_reint+0x44/0xe0 [mdt] [77051.159609] [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt] [77051.163421] [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt] [77051.163763] [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc] [77051.164304] [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [77051.164635] [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [77051.164923] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [77051.165209] [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc] [77051.165478] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.166098] [<ffffffff8100c14a>] child_rip+0xa/0x20 [77051.166279] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.166470] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.166643] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [77051.166802] [77051.166922] LustreError: dumping log to /tmp/lustre-log.1359282032.6227 [77051.270779] LNet: Service thread pid 30749 was inactive for 62.12s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [77051.271595] Pid: 30749, comm: mdt00_004 [77051.271845] [77051.271846] Call Trace: [77051.272326] [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30 [77051.272606] [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0 [77051.272926] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10 [77051.273228] [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs] [77051.273570] [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc] [77051.273900] [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc] [77051.274404] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [77051.274723] [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc] [77051.275057] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77051.275395] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77051.275709] [<ffffffffa080984a>] mdt_reint_rename+0x7ba/0x1a70 [mdt] [77051.276044] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77051.276376] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77051.276723] [<ffffffffa0e29700>] ? lu_ucred+0x20/0x30 [obdclass] [77051.277061] [<ffffffffa07e7ff5>] ? mdt_ucred+0x15/0x20 [mdt] [77051.277363] [<ffffffffa080229c>] ? mdt_root_squash+0x2c/0x410 [mdt] [77051.277695] [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc] [77051.278010] [<ffffffffa0e29700>] ? lu_ucred+0x20/0x30 [obdclass] [77051.278299] [<ffffffffa0806a51>] mdt_reint_rec+0x41/0xe0 [mdt] [77051.278606] [<ffffffffa08000f3>] mdt_reint_internal+0x4f3/0x7e0 [mdt] [77051.278916] [<ffffffffa0800424>] mdt_reint+0x44/0xe0 [mdt] [77051.279202] [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt] [77051.279526] [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt] [77051.279864] [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc] [77051.280418] [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [77051.280734] [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [77051.281920] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [77051.282248] [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc] [77051.282606] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.282906] [<ffffffff8100c14a>] child_rip+0xa/0x20 [77051.283203] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.283529] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.283830] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [77051.284145] [77051.304078] Pid: 819, comm: mdt01_005 [77051.304351] [77051.304352] Call Trace: [77051.304821] [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30 [77051.305120] [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0 [77051.305424] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10 [77051.305728] [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs] [77051.306062] [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc] [77051.306400] [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc] [77051.306905] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [77051.307224] [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc] [77051.307562] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77051.307892] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77051.308233] [<ffffffffa07eadc0>] mdt_object_lock0+0x360/0xb10 [mdt] [77051.308525] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77051.309085] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77051.309409] [<ffffffffa07eb614>] mdt_object_lock+0x14/0x20 [mdt] [77051.309714] [<ffffffffa07fd9c5>] mdt_getattr_name_lock+0xee5/0x1980 [mdt] [77051.310049] [<ffffffffa0fa6275>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [77051.310377] [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc] [77051.310710] [<ffffffffa0fa8474>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] [77051.311017] [<ffffffffa07fe6fd>] mdt_intent_getattr+0x29d/0x490 [mdt] [77051.311280] [<ffffffffa07fc26e>] mdt_intent_policy+0x3ae/0x750 [mdt] [77051.311568] [<ffffffffa0f5f70a>] ldlm_lock_enqueue+0x2ea/0x870 [ptlrpc] [77051.311843] [<ffffffffa0f85f07>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc] [77051.312143] [<ffffffffa07fc746>] mdt_enqueue+0x46/0x110 [mdt] [77051.312420] [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt] [77051.312690] [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt] [77051.312986] [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc] [77051.313421] [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [77051.313690] [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [77051.313952] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [77051.314203] [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc] [77051.314477] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.314730] [<ffffffff8100c14a>] child_rip+0xa/0x20 [77051.314977] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.315274] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.315605] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [77051.315836] [77051.316029] LustreError: dumping log to /tmp/lustre-log.1359282032.819 [77051.372055] Pid: 29786, comm: mdt00_001 [77051.372306] [77051.372307] Call Trace: [77051.372715] [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30 [77051.372988] [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0 [77051.373257] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10 [77051.373537] [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs] [77051.373850] [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc] [77051.374153] [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc] [77051.374620] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [77051.374910] [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc] [77051.375217] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77051.375532] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77051.375816] [<ffffffffa07eadc0>] mdt_object_lock0+0x360/0xb10 [mdt] [77051.376135] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77051.376447] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77051.376746] [<ffffffffa07eb614>] mdt_object_lock+0x14/0x20 [mdt] [77051.377025] [<ffffffffa07fd9c5>] mdt_getattr_name_lock+0xee5/0x1980 [mdt] [77051.377339] [<ffffffffa0fa6275>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [77051.377641] [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc] [77051.377977] [<ffffffffa0fa8474>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] [77051.378274] [<ffffffffa07fe6fd>] mdt_intent_getattr+0x29d/0x490 [mdt] [77051.378564] [<ffffffffa07fc26e>] mdt_intent_policy+0x3ae/0x750 [mdt] [77051.378858] [<ffffffffa0f5f70a>] ldlm_lock_enqueue+0x2ea/0x870 [ptlrpc] [77051.379163] [<ffffffffa0f85f07>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc] [77051.379460] [<ffffffffa07fc746>] mdt_enqueue+0x46/0x110 [mdt] [77051.379735] [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt] [77051.380053] [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt] [77051.380358] [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc] [77051.380835] [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [77051.381128] [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [77051.381421] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [77051.381696] [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc] [77051.381992] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.382269] [<ffffffff8100c14a>] child_rip+0xa/0x20 [77051.382545] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.382839] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77051.383114] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [77051.383373] [77051.383565] LustreError: dumping log to /tmp/lustre-log.1359282032.29786 [77052.220055] LNet: Service thread pid 10930 was inactive for 62.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: [77052.220826] LNet: Skipped 2 previous similar messages [77052.221069] Pid: 10930, comm: mdt00_010 [77052.221281] [77052.221281] Call Trace: [77052.221656] [<ffffffff814fb054>] ? _spin_lock_irqsave+0x24/0x30 [77052.221907] [<ffffffff814f8ad1>] schedule_timeout+0x191/0x2e0 [77052.222159] [<ffffffff8107bcd0>] ? process_timeout+0x0/0x10 [77052.222416] [<ffffffffa0bbf771>] cfs_waitq_timedwait+0x11/0x20 [libcfs] [77052.222703] [<ffffffffa0f7ed8d>] ldlm_completion_ast+0x4dd/0x950 [ptlrpc] [77052.222984] [<ffffffffa0f7a6a0>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc] [77052.223414] [<ffffffff81057d60>] ? default_wake_function+0x0/0x20 [77052.223690] [<ffffffffa0f7e4d8>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc] [77052.223974] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77052.224247] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77052.224511] [<ffffffffa07eadc0>] mdt_object_lock0+0x360/0xb10 [mdt] [77052.224799] [<ffffffffa07e8830>] ? mdt_blocking_ast+0x0/0x2a0 [mdt] [77052.225083] [<ffffffffa0f7e8b0>] ? ldlm_completion_ast+0x0/0x950 [ptlrpc] [77052.225502] [<ffffffffa07eb614>] mdt_object_lock+0x14/0x20 [mdt] [77052.225781] [<ffffffffa07fd9c5>] mdt_getattr_name_lock+0xee5/0x1980 [mdt] [77052.226094] [<ffffffffa0fa6275>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [77052.226413] [<ffffffffa0fd16f6>] ? __req_capsule_get+0x166/0x700 [ptlrpc] [77052.226712] [<ffffffffa0fa8474>] ? lustre_msg_get_flags+0x34/0xb0 [ptlrpc] [77052.227012] [<ffffffffa07fe6fd>] mdt_intent_getattr+0x29d/0x490 [mdt] [77052.227298] [<ffffffffa07fc26e>] mdt_intent_policy+0x3ae/0x750 [mdt] [77052.227577] [<ffffffffa0f5f70a>] ldlm_lock_enqueue+0x2ea/0x870 [ptlrpc] [77052.227855] [<ffffffffa0f85f07>] ldlm_handle_enqueue0+0x4f7/0x1090 [ptlrpc] [77052.228229] [<ffffffffa07fc746>] mdt_enqueue+0x46/0x110 [mdt] [77052.228503] [<ffffffffa07f0732>] mdt_handle_common+0x8e2/0x1690 [mdt] [77052.228830] [<ffffffffa0825e95>] mds_regular_handle+0x15/0x20 [mdt] [77052.229154] [<ffffffffa0fb5af3>] ptlrpc_server_handle_request+0x463/0xe60 [ptlrpc] [77052.229758] [<ffffffffa0bbf65e>] ? cfs_timer_arm+0xe/0x10 [libcfs] [77052.230033] [<ffffffffa0fae871>] ? ptlrpc_wait_event+0xb1/0x2a0 [ptlrpc] [77052.230322] [<ffffffff81051f73>] ? __wake_up+0x53/0x70 [77052.230615] [<ffffffffa0fb85fd>] ptlrpc_main+0xb3d/0x18c0 [ptlrpc] [77052.230880] [<ffffffffa01979c0>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc] [77052.231181] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77052.231489] [<ffffffff8100c14a>] child_rip+0xa/0x20 [77052.231737] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77052.232006] [<ffffffffa0fb7ac0>] ? ptlrpc_main+0x0/0x18c0 [ptlrpc] [77052.232260] [<ffffffff8100c140>] ? child_rip+0x0/0x20 [77052.232498] [77052.232673] LustreError: dumping log to /tmp/lustre-log.1359282033.10930 [77052.235465] LNet: Service thread pid 29790 was inactive for 62.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [77052.252077] LNet: Service thread pid 29788 was inactive for 62.00s. Watchdog stack traces are limited to 3 per 300 seconds, skipping this one. [77052.252623] LNet: Skipped 2 previous similar messages [77052.252883] LustreError: dumping log to /tmp/lustre-log.1359282033.29788 [77052.268049] LustreError: dumping log to /tmp/lustre-log.1359282033.29785 ... [77189.148054] Lustre: lock timed out (enqueued at 1359281970, 200s ago) [77189.148058] Lustre: lock timed out (enqueued at 1359281970, 200s ago) [77189.168469] LustreError: dumping log to /tmp/lustre-log.1359282170.30749 [77189.304162] Lustre: lock timed out (enqueued at 1359281970, 200s ago) [77190.220049] Lustre: lock timed out (enqueued at 1359281971, 200s ago) [77190.220048] Lustre: lock timed out (enqueued at 1359281971, 200s ago) [77190.220048] Lustre: lock timed out (enqueued at 1359281971, 200s ago) [77190.220048] Lustre: Skipped 1 previous similar message [77190.220051] Lustre: Skipped 1 previous similar message