[LU-5464] Hung ll_ost01 Created: 08/Aug/14  Updated: 16/Oct/15  Resolved: 16/Oct/15

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.3
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Mahmoud Hanafi Assignee: Zhenyu Xu
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

clients: 2.1.5/2.4.3
server: 2.4.3


Severity: 3
Rank (Obsolete): 15222

 Description   

OSS getting several ll_ost hung threads.

LNet: 2842:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Skipped 1 previous similar message
LNet: Service thread pid 11968 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes:
LNet: Skipped 1 previous similar message
Pid: 11968, comm: ll_ost01_089
 
Call Trace:
 [<ffffffff815404c2>] schedule_timeout+0x192/0x2e0
 [<ffffffff81080610>] ? process_timeout+0x0/0x10
 [<ffffffffa04886d1>] cfs_waitq_timedwait+0x11/0x20 [libcfs]
 [<ffffffffa0744ffd>] ldlm_completion_ast+0x4ed/0x960 [ptlrpc]
 [<ffffffffa0740790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc]
 [<ffffffff81063be0>] ? default_wake_function+0x0/0x20
 [<ffffffffa0744738>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc]
 [<ffffffffa0744b10>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
 [<ffffffffa07434b0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
 [<ffffffffa0e303a1>] ofd_destroy_by_fid+0x321/0x710 [ofd]
 [<ffffffffa07434b0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc]
 [<ffffffffa0744b10>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc]
 [<ffffffffa076d125>] ? lustre_msg_buf+0x55/0x60 [ptlrpc]
 [<ffffffffa0e34fd7>] ofd_destroy+0x1a7/0x8b0 [ofd]
 [<ffffffffa0771430>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc]
 [<ffffffffa0e078a9>] ost_handle+0x4349/0x48e0 [ost]
 [<ffffffffa0494124>] ? libcfs_id2str+0x74/0xb0 [libcfs]
 [<ffffffffa077e3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc]
 [<ffffffffa04885de>] ? cfs_timer_arm+0xe/0x10 [libcfs]
 [<ffffffffa0499d6f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs]
 [<ffffffffa0775719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc]
 [<ffffffff81055813>] ? __wake_up+0x53/0x70
 [<ffffffffa077f74e>] ptlrpc_main+0xace/0x1700 [ptlrpc]
 [<ffffffffa077ec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c0ca>] child_rip+0xa/0x20
 [<ffffffffa077ec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffffa077ec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc]
 [<ffffffff8100c0c0>] ? child_rip+0x0/0x20

See clients hangs and ost disconnects from mds.



 Comments   
Comment by Peter Jones [ 08/Aug/14 ]

Bobijam

Could you please advise on this issue?

Thanks

Peter

Comment by Zhenyu Xu [ 11/Aug/14 ]

Do you have OST and client debug logs of this issue?

Comment by Mahmoud Hanafi [ 15/Oct/15 ]

This can be closed

Comment by Peter Jones [ 16/Oct/15 ]

ok Mahmoud

Generated at Sat Feb 10 01:51:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.