Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 1.8.6
-
None
-
Lustre 1.8.3.0-5chaos
-
4
-
10250
Description
We had a client get stuck in ptlrpc_invalidate_import() after it was evicted. Info will be limited since it was on the secure network.
On the console, the client is printing this every ten minutes:
ptlrpc_invalidate_import()) ls3-OST01c4_UUID: rc = -110 waiting for callback
(1 != 0)
ptlrpc_invalidate_import()) Skipped 5 previous similar messages
ptlrpc_invalidate_import()) @@@ still on sending list req@<hex> x<xid>/t0
o4->ls3-OST01c4_UUID@<ip>@tcp:6/4 len 448/608 e 5 to 1 dl <time> ref 2 fl
Unregistering:ES/0/0 rc -4/0
ptlrpc_invalidate_import()) Skipped 5 previous similar messages
ptlrpc_invalidate_import()) ls3-OST01c4_UUID: RPCs in "Unregistering" phase
found (1). Network is sluggish? Waiting them to error out.
ptlrpc_invalidate_import()) Skipped 5 previous similar messages
and it is the ll_imp_inval thread that appears to be looping indefinitely (it was printing that for well over a month before I was alerted to the problem).
The thread "ldlm_bl_11" was stuck in sync_page(), with the following backtrace:
schedule
io_schedule
sync_page
__wait_on_bit_lock
__lock_page
ll_page_removal_cb
cache_remove_lock
lock_handle_addref
class_handle2object
ldlm_cli_cancel_local
ldlm_cli_cancel
osc_extent_blocking_cb
ldlm_handle_bl_callback
ldlm_bl_thread_main
Whether that is symptom or cause for the hung import invalidate, I do not know.