Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.3.0
-
None
-
3
-
4594
Description
The patch set which added imperative recovery to 2.2 modified how ASTs are sent by servers.
AST requests used to be sent by the service thread itself and it is now sent by ptlrpcds.
The drawback is that ptlrpcd threads can now do disk I/O to update the LVB if the callback failed to be sent:
ldlm_cb_interpret
-> ldlm_handle_ast_error
-> ldlm_res_lvbo_update
Although we now have multiple ptlrpcd threads, it is still a bad idea to block ptlrpcd for an undefined amount of time.
I think we can restore the original logic (i.e. using one single request set managed by the service thread) while addressing the needs of imperative recovery which wants to notify all client nodes ASAP and not wait for all ASTs in the set to be completed before sending the next wave of ASTs.
I'm going to attach a patch which is also useful for quota since we need to process glimpse ASTs as all other ASTs and we need to do I/Os in the interpret function.