Details
-
Bug
-
Resolution: Cannot Reproduce
-
Minor
-
None
-
Lustre 1.8.6
-
None
-
3
-
10343
Description
We saw the following call traces on MDS and it hanged after it.
Apr 23 15:58:34 ALPL505 kernel: Call Trace: Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88953a00>] ldlm_expired_completion_wait+0x0/0x250 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88955542>] ldlm_completion_ast+0x4c2/0x880 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8893a709>] ldlm_lock_enqueue+0x9d9/0xb20 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8008e421>] default_wake_function+0x0/0xe Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88935b6a>] ldlm_lock_addref_internal_nolock+0x3a/0x90 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff889540bb>] ldlm_cli_enqueue_local+0x46b/0x520 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88caa157>] enqueue_ordered_locks+0x387/0x4d0 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff889519a0>] ldlm_blocking_ast+0x0/0x2a0 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88955080>] ldlm_completion_ast+0x0/0x880 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88caa8e9>] mds_get_parent_child_locked+0x649/0x960 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88c9b652>] mds_getattr_lock+0x632/0xc90 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88c96dda>] fixup_handle_for_resent_req+0x5a/0x2c0 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88ca1d83>] mds_intent_policy+0x623/0xc20 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8893c270>] ldlm_resource_putref_internal+0x230/0x460 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88939eb6>] ldlm_lock_enqueue+0x186/0xb20 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff889367fd>] ldlm_lock_create+0x9bd/0x9f0 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8895e870>] ldlm_server_blocking_ast+0x0/0x83d [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8895bb39>] ldlm_handle_enqueue+0xc09/0x1210 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88ca0b30>] mds_handle+0x40e0/0x4d10 [mds] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff800774ed>] smp_send_reschedule+0x4e/0x53 Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8008ddcd>] enqueue_task+0x41/0x56 Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8897fd55>] lustre_msg_get_conn_cnt+0x35/0xf0 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff889896d9>] ptlrpc_server_handle_request+0x989/0xe00 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88989e35>] ptlrpc_wait_event+0x2e5/0x310 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8008c85d>] __wake_up_common+0x3e/0x68 Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8898adc6>] ptlrpc_main+0xf66/0x1120 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 Apr 23 15:58:34 ALPL505 kernel: [<ffffffff88989e60>] ptlrpc_main+0x0/0x1120 [ptlrpc] Apr 23 15:58:34 ALPL505 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
this might be related to LU-59, but please review on this.
Attachments
Issue Links
- Trackbacks
-
Lustre 1.8.x known issues tracker While testing against Lustre b18 branch, we would hit known bugs which were already reported in Lustre Bugzilla https://bugzilla.lustre.org/. In order to move away from relying on Bugzilla, we would create a JIRA