Details
-
Bug
-
Resolution: Won't Fix
-
Critical
-
None
-
Lustre 2.4.3
-
None
-
clients: 2.1.5/2.4.3
server: 2.4.3
-
3
-
15222
Description
OSS getting several ll_ost hung threads.
LNet: 2842:0:(o2iblnd_cb.c:2348:kiblnd_passive_connect()) Skipped 1 previous similar message LNet: Service thread pid 11968 was inactive for 200.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: LNet: Skipped 1 previous similar message Pid: 11968, comm: ll_ost01_089 Call Trace: [<ffffffff815404c2>] schedule_timeout+0x192/0x2e0 [<ffffffff81080610>] ? process_timeout+0x0/0x10 [<ffffffffa04886d1>] cfs_waitq_timedwait+0x11/0x20 [libcfs] [<ffffffffa0744ffd>] ldlm_completion_ast+0x4ed/0x960 [ptlrpc] [<ffffffffa0740790>] ? ldlm_expired_completion_wait+0x0/0x390 [ptlrpc] [<ffffffff81063be0>] ? default_wake_function+0x0/0x20 [<ffffffffa0744738>] ldlm_cli_enqueue_local+0x1f8/0x5d0 [ptlrpc] [<ffffffffa0744b10>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc] [<ffffffffa07434b0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] [<ffffffffa0e303a1>] ofd_destroy_by_fid+0x321/0x710 [ofd] [<ffffffffa07434b0>] ? ldlm_blocking_ast+0x0/0x180 [ptlrpc] [<ffffffffa0744b10>] ? ldlm_completion_ast+0x0/0x960 [ptlrpc] [<ffffffffa076d125>] ? lustre_msg_buf+0x55/0x60 [ptlrpc] [<ffffffffa0e34fd7>] ofd_destroy+0x1a7/0x8b0 [ofd] [<ffffffffa0771430>] ? lustre_swab_ost_body+0x0/0x10 [ptlrpc] [<ffffffffa0e078a9>] ost_handle+0x4349/0x48e0 [ost] [<ffffffffa0494124>] ? libcfs_id2str+0x74/0xb0 [libcfs] [<ffffffffa077e3b8>] ptlrpc_server_handle_request+0x398/0xc60 [ptlrpc] [<ffffffffa04885de>] ? cfs_timer_arm+0xe/0x10 [libcfs] [<ffffffffa0499d6f>] ? lc_watchdog_touch+0x6f/0x170 [libcfs] [<ffffffffa0775719>] ? ptlrpc_wait_event+0xa9/0x290 [ptlrpc] [<ffffffff81055813>] ? __wake_up+0x53/0x70 [<ffffffffa077f74e>] ptlrpc_main+0xace/0x1700 [ptlrpc] [<ffffffffa077ec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c0ca>] child_rip+0xa/0x20 [<ffffffffa077ec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffffa077ec80>] ? ptlrpc_main+0x0/0x1700 [ptlrpc] [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
See clients hangs and ost disconnects from mds.