|
When a target sends a callback RPC to a dead client, the thread handling the RPC which triggered the lock callback RPC will keep a CPU busy until the client is eventually evicted. The dead client is supposed to be evicted by lock callback timeout, but depending on the server activity, this could not work and it will be evicted by ping_evictor instead, keeping the service thread busy 50% longer. It could also trigger behavior like "Already past deadline, not sending early reply" or "Request took longer than estimated, client may timeout".
The more you trigger callback RPC like that, the more you keep service threads busy.
This issue was detected with Lustre 2.10 but was reproduced with 'master' branch. Due to all the changes related to MR, some error message patterns are slightly different but the CPU usage is still there. Here is the error logs using master:
Note the high value of: `Skipped 8141422 previous similar messages`
Steps to reproduce
This issue could be reproduced using test framework to start a filesystem and 2 other clients.
- On client A, start writing to a file and reboot the host without any cleaning to simulate a host crash
- On client B, try to read the file after client A just crashed. Client B process will hang, waiting for lock completion.
- On the OST where the file is located, you should start seeing the request timeout error messages and the load increasing after tens of seconds (tested with ksocklnd).
clientA $ (sudo dd if=/dev/zero of=/mnt/lustre/callback bs=4k count=1M &); sleep 1; sudo reboot -nff
clientB $ dd if=/mnt/lustre/callback count=1 >/dev/null
(hang)
oss1 $ # see load increasing after ~100 sec and a service thread at 100%
Analysis
The thread is looping with the following stack trace:
ptlrpc_set_wait
ldlm_run_ast_work
ldlm_glimpse_locks
ofd_intent_policy
...
ptlrpc_set_wait() is calling ptlrpc_check_set() -> ptlrpc_expire_one_request().
ptlrpc_check_set() will be always trying to resend the callback, which will by every few seconds at the beginning, as the socket still exists. It will retry after each request deadline, until the socket timeout is eventually reached and this socket is closed. At that point, re-sending the RPC will fail right away (ENOTCONN). l_wait_event() in ptlrpc_set_wait() will keep calling ptlrpc_check_set() which will fail to send the RPC. The thread will be churning into that code path until the client is eventually evicted.
I'm proposing a patch which avoid re-resending a RPC if it was send less than a second ago and failed due to network error.
|