Details
-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
Lustre 2.5.3
-
None
-
4
-
9223372036854775807
Description
Server threads getting hung. Some time they will clear by them self but not always.
<3>LustreError: 26311:0:(ldlm_lib.c:2715:target_bulk_io()) @@@ timeout on bulk PUT after 100+0s req@ffff881cd7d35800 x1526640544175816/t0(0) o3->d13c09b7-cb83-8238-fde5-d86e0a048f3a@10.151.63.50@o2ib:0/0 lens 488/432 e 1 to 0 dl 1456265702 ref 1 fl Interpret:/0/0 rc 0/0 <3>LustreError: 26311:0:(ldlm_lib.c:2715:target_bulk_io()) Skipped 62 previous similar messages <6>LNet: 2454:0:(o2iblnd_cb.c:1937:kiblnd_close_conn_locked()) Closing conn to 10.151.27.36@o2ib: error 0(waiting) <6>LNet: 2454:0:(o2iblnd_cb.c:1937:kiblnd_close_conn_locked()) Skipped 3 previous similar messages <3>LustreError: 0:0:(ldlm_lockd.c:346:waiting_locks_callback()) ### lock callback timer expired after 225s: evicting client at 10.151.55.224@o2ib ns: filter-nbp8-OST00e0_UUID lock: ffff880b6d477100/0xc9935e7dda65f272 lrc: 4/0,0 mode: PW/PW res: [0x1fff69f:0x0:0x0].0 rrc: 32 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000000000020 nid: 10.151.55.224@o2ib remote: 0x18f2b8c9e47dad96 expref: 4 pid: 2516 timeout: 4704175646 lvb_type: 0 <3>LustreError: 2513:0:(ldlm_lockd.c:664:ldlm_handle_ast_error()) ### client (nid 10.151.55.224@o2ib) returned 0 from completion AST ns: filter-nbp8-OST00e0_UUID lock: ffff880b6d477100/0xc9935e7dda65f272 lrc: 6/0,0 mode: PW/PW res: [0x1fff69f:0x0:0x0].0 rrc: 33 type: EXT [0->18446744073709551615] (req 0->18446744073709551615) flags: 0x60000000000020 nid: 10.151.55.224@o2ib remote: 0x18f2b8c9e47dad96 expref: 4 pid: 2516 timeout: 4704175646 lvb_type: 0 <3>LustreError: 0:0:(ldlm_lockd.c:346:waiting_locks_callback()) ### lock callback timer expired after 225s: evicting client at 10.151.17.216@o2ib ns: filter-nbp8-OST012e_UUID lock: ffff8805c4f78180/0xc9935e7dda65e30d lrc: 4/0,0 mode: PW/PW res: [0x1ff164b:0x0:0x0].0 rrc: 2 type: EXT [0->18446744073709551615] (req 0->4095) flags: 0x60000000010020 nid: 10.151.17.216@o2ib remote: 0x179b912da7369126 expref: 4 pid: 2472 timeout: 4704191266 lvb_type: 0 <6>LNet: 2454:0:(o2iblnd_cb.c:1937:kiblnd_close_conn_locked()) Closing conn to 10.151.27.24@o2ib: error 0(waiting) <6>LNet: 2454:0:(o2iblnd_cb.c:1937:kiblnd_close_conn_locked()) Skipped 5 previous similar messages <4>LNet: Service thread pid 28275 was inactive for 300.00s. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: <4>Pid: 28275, comm: ll_ost_io02_091 <4> <4>Call Trace: <4> [<ffffffff810867ec>] ? lock_timer_base+0x3c/0x70 <4> [<ffffffff81566572>] schedule_timeout+0x192/0x2e0 <4> [<ffffffff81086900>] ? process_timeout+0x0/0x10 <4> [<ffffffffa0798c58>] ptlrpc_abort_bulk+0x158/0x2e0 [ptlrpc] <4> [<ffffffff81064f90>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa0765694>] target_bulk_io+0x474/0x980 [ptlrpc] <4> [<ffffffffa04ce200>] ? cfs_crypto_hash_final+0x30/0x80 [libcfs] <4> [<ffffffff81064f90>] ? default_wake_function+0x0/0x20 <4> [<ffffffffa150287c>] ost_brw_read+0x103c/0x1350 [ost] <4> [<ffffffff812b9076>] ? vsnprintf+0x336/0x5e0 <4> [<ffffffffa075e500>] ? target_bulk_timeout+0x0/0xc0 [ptlrpc] <4> [<ffffffffa04d47c3>] ? libcfs_debug_vmsg2+0x5d3/0xbd0 [libcfs] <4> [<ffffffffa079ff8c>] ? lustre_msg_get_version+0x8c/0x100 [ptlrpc] <4> [<ffffffffa07a00e8>] ? lustre_msg_check_version+0xe8/0x100 [ptlrpc] <4> [<ffffffffa15096d8>] ost_handle+0x24a8/0x44d0 [ost] <4> [<ffffffffa04d0a44>] ? libcfs_id2str+0x74/0xb0 [libcfs] <4> [<ffffffffa07ae0c5>] ptlrpc_server_handle_request+0x385/0xc00 [ptlrpc] <4> [<ffffffffa04d68d5>] ? lc_watchdog_touch+0x65/0x170 [libcfs] <4> [<ffffffffa07a6a69>] ? ptlrpc_wait_event+0xa9/0x2d0 [ptlrpc] <4> [<ffffffffa07b089d>] ptlrpc_main+0xafd/0x1780 [ptlrpc] <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffffa07afda0>] ? ptlrpc_main+0x0/0x1780 [ptlrpc] <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4>
I am attaching backtrace and lustre debug dump.