[LU-1433] thread hangs on __d_lookup() Created: 23/May/12  Updated: 06/Feb/14  Resolved: 06/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 1.8.6
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Shuichi Ihara (Inactive) Assignee: Hongchao Zhang
Resolution: Cannot Reproduce Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 10155

 Description   

at the our customer site, we saw a thread hangs on MDS with the normal operation.

May 21 12:58:00 ALPL506 kernel: Call Trace:
May 21 12:58:00 ALPL506 kernel:  [<ffffffff80009852>] __d_lookup+0xb0/0xff
May 21 12:58:00 ALPL506 kernel:  [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b
May 21 12:58:00 ALPL506 kernel:  [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88c9ffd7>] :mds:mds_get_md+0x47/0x1c0
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88ca03ee>] :mds:mds_pack_md+0x29e/0x370
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88ca06df>] :mds:mds_getattr_internal+0x21f/0x840
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88ca3ad5>] :mds:mds_getattr_lock+0xab5/0xc90
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88c9edda>] :mds:fixup_handle_for_resent_req+0x5a/0x2c0
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88ca9d83>] :mds:mds_intent_policy+0x623/0xc20
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88944270>] :ptlrpc:ldlm_resource_putref_internal+0x230/0x460
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88941eb6>] :ptlrpc:ldlm_lock_enqueue+0x186/0xb20
May 21 12:58:00 ALPL506 kernel:  [<ffffffff8893e7fd>] :ptlrpc:ldlm_lock_create+0x9bd/0x9f0
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88966870>] :ptlrpc:ldlm_server_blocking_ast+0x0/0x83d
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88963b39>] :ptlrpc:ldlm_handle_enqueue+0xc09/0x1210
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88ca8b30>] :mds:mds_handle+0x40e0/0x4d10
May 21 12:58:00 ALPL506 kernel:  [<ffffffff800774ed>] smp_send_reschedule+0x4e/0x53
May 21 12:58:00 ALPL506 kernel:  [<ffffffff8008ddcd>] enqueue_task+0x41/0x56
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88987d55>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0
May 21 12:58:00 ALPL506 kernel:  [<ffffffff889916d9>] :ptlrpc:ptlrpc_server_handle_request+0x989/0xe00
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88991e35>] :ptlrpc:ptlrpc_wait_event+0x2e5/0x310
May 21 12:58:00 ALPL506 kernel:  [<ffffffff8008c85d>] __wake_up_common+0x3e/0x68
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88992dc6>] :ptlrpc:ptlrpc_main+0xf66/0x1120
May 21 12:58:00 ALPL506 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
May 21 12:58:00 ALPL506 kernel:  [<ffffffff88991e60>] :ptlrpc:ptlrpc_main+0x0/0x1120
May 21 12:58:00 ALPL506 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11


 Comments   
Comment by Andreas Dilger [ 23/May/12 ]

Hi Ihara, which kernel, Oracle or WC 1.8.6?

Comment by Shuichi Ihara (Inactive) [ 23/May/12 ]

lustre-1.8.6-wc1, and SLES11SP1 for the clients, CentOS5.6 for servers.

Comment by Peter Jones [ 24/May/12 ]

Hongchao

Can you please look into this one?

Thanks

Peter

Comment by Hongchao Zhang [ 25/May/12 ]

this thread is stucked during locking the inode->i_mutex, normally, there should be another thread using this mutex lock, which somehow
try to use another lock holding the this thread, which causes dead lock,

Hi Ihara, is there other thread gotten stuck, and is the stack of all processes available? Thanks!

Comment by Shuichi Ihara (Inactive) [ 25/May/12 ]

MDS did failover before we get the stack trace, so, we don't have it...

Comment by Shuichi Ihara (Inactive) [ 06/Feb/14 ]

Not reproduced a long while. They upgraded lustre-1.8.9, anyway. pleaes close this ticket. We would reopen when if same problem happens.

Comment by Peter Jones [ 06/Feb/14 ]

ok thanks Ihara

Generated at Sat Feb 10 01:16:35 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.