[LU-1433] thread hangs on __d_lookup() Created: 23/May/12 Updated: 06/Feb/14 Resolved: 06/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 1.8.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Shuichi Ihara (Inactive) | Assignee: | Hongchao Zhang |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 10155 |
| Description |
|
at the our customer site, we saw a thread hangs on MDS with the normal operation. May 21 12:58:00 ALPL506 kernel: Call Trace: May 21 12:58:00 ALPL506 kernel: [<ffffffff80009852>] __d_lookup+0xb0/0xff May 21 12:58:00 ALPL506 kernel: [<ffffffff80063c4f>] __mutex_lock_slowpath+0x60/0x9b May 21 12:58:00 ALPL506 kernel: [<ffffffff80063c99>] .text.lock.mutex+0xf/0x14 May 21 12:58:00 ALPL506 kernel: [<ffffffff88c9ffd7>] :mds:mds_get_md+0x47/0x1c0 May 21 12:58:00 ALPL506 kernel: [<ffffffff88ca03ee>] :mds:mds_pack_md+0x29e/0x370 May 21 12:58:00 ALPL506 kernel: [<ffffffff88ca06df>] :mds:mds_getattr_internal+0x21f/0x840 May 21 12:58:00 ALPL506 kernel: [<ffffffff88ca3ad5>] :mds:mds_getattr_lock+0xab5/0xc90 May 21 12:58:00 ALPL506 kernel: [<ffffffff88c9edda>] :mds:fixup_handle_for_resent_req+0x5a/0x2c0 May 21 12:58:00 ALPL506 kernel: [<ffffffff88ca9d83>] :mds:mds_intent_policy+0x623/0xc20 May 21 12:58:00 ALPL506 kernel: [<ffffffff88944270>] :ptlrpc:ldlm_resource_putref_internal+0x230/0x460 May 21 12:58:00 ALPL506 kernel: [<ffffffff88941eb6>] :ptlrpc:ldlm_lock_enqueue+0x186/0xb20 May 21 12:58:00 ALPL506 kernel: [<ffffffff8893e7fd>] :ptlrpc:ldlm_lock_create+0x9bd/0x9f0 May 21 12:58:00 ALPL506 kernel: [<ffffffff88966870>] :ptlrpc:ldlm_server_blocking_ast+0x0/0x83d May 21 12:58:00 ALPL506 kernel: [<ffffffff88963b39>] :ptlrpc:ldlm_handle_enqueue+0xc09/0x1210 May 21 12:58:00 ALPL506 kernel: [<ffffffff88ca8b30>] :mds:mds_handle+0x40e0/0x4d10 May 21 12:58:00 ALPL506 kernel: [<ffffffff800774ed>] smp_send_reschedule+0x4e/0x53 May 21 12:58:00 ALPL506 kernel: [<ffffffff8008ddcd>] enqueue_task+0x41/0x56 May 21 12:58:00 ALPL506 kernel: [<ffffffff88987d55>] :ptlrpc:lustre_msg_get_conn_cnt+0x35/0xf0 May 21 12:58:00 ALPL506 kernel: [<ffffffff889916d9>] :ptlrpc:ptlrpc_server_handle_request+0x989/0xe00 May 21 12:58:00 ALPL506 kernel: [<ffffffff88991e35>] :ptlrpc:ptlrpc_wait_event+0x2e5/0x310 May 21 12:58:00 ALPL506 kernel: [<ffffffff8008c85d>] __wake_up_common+0x3e/0x68 May 21 12:58:00 ALPL506 kernel: [<ffffffff88992dc6>] :ptlrpc:ptlrpc_main+0xf66/0x1120 May 21 12:58:00 ALPL506 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11 May 21 12:58:00 ALPL506 kernel: [<ffffffff88991e60>] :ptlrpc:ptlrpc_main+0x0/0x1120 May 21 12:58:00 ALPL506 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11 |
| Comments |
| Comment by Andreas Dilger [ 23/May/12 ] |
|
Hi Ihara, which kernel, Oracle or WC 1.8.6? |
| Comment by Shuichi Ihara (Inactive) [ 23/May/12 ] |
|
lustre-1.8.6-wc1, and SLES11SP1 for the clients, CentOS5.6 for servers. |
| Comment by Peter Jones [ 24/May/12 ] |
|
Hongchao Can you please look into this one? Thanks Peter |
| Comment by Hongchao Zhang [ 25/May/12 ] |
|
this thread is stucked during locking the inode->i_mutex, normally, there should be another thread using this mutex lock, which somehow Hi Ihara, is there other thread gotten stuck, and is the stack of all processes available? Thanks! |
| Comment by Shuichi Ihara (Inactive) [ 25/May/12 ] |
|
MDS did failover before we get the stack trace, so, we don't have it... |
| Comment by Shuichi Ihara (Inactive) [ 06/Feb/14 ] |
|
Not reproduced a long while. They upgraded lustre-1.8.9, anyway. pleaes close this ticket. We would reopen when if same problem happens. |
| Comment by Peter Jones [ 06/Feb/14 ] |
|
ok thanks Ihara |