[LU-14027] Client recovery statemachine hangs in recovery disconnected during lock reply Created: 14/Oct/20 Updated: 25/Oct/23 Resolved: 19/Nov/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0, Lustre 2.12.6 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.7 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Oleg Drokin | Assignee: | Oleg Drokin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
The direct consequence from that is recovery state machine never advances from REPLAY to REPLAY_LOCKS status when imp_replay_inflight is non zero:
if (imp->imp_state == LUSTRE_IMP_REPLAY) {
CDEBUG(D_HA, "replay requested by %s\n",
obd2cli_tgt(imp->imp_obd));
rc = ptlrpc_replay_next(imp, &inflight);
if (inflight == 0 &&
atomic_read(&imp->imp_replay_inflight) == 0) {
import_set_state(imp, LUSTRE_IMP_REPLAY_LOCKS);
rc = ldlm_replay_locks(imp);
if (rc)
GOTO(out, rc);
}
rc = 0;
}
To break this we either need to check import state in the replay locks thread before attempting any sending or make sure replay_one_lock() prepares resend requests in such a state that they are never stuck. |
| Comments |
| Comment by Gerrit Updater [ 14/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40238 |
| Comment by Gerrit Updater [ 16/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/40272 |
| Comment by Gerrit Updater [ 19/Nov/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40272/ |
| Comment by Gerrit Updater [ 19/Nov/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/40238/ |
| Comment by Peter Jones [ 19/Nov/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 14/Jan/21 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/41223 |
| Comment by Gerrit Updater [ 14/Jan/21 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/41224 |
| Comment by Etienne Aujames [ 14/Jan/21 ] |
|
The patch above fix the https://review.whamcloud.com/39111/ (" |
| Comment by Gerrit Updater [ 14/Jan/21 ] |
|
Etienne AUJAMES (eaujames@ddn.com) uploaded a new patch: https://review.whamcloud.com/41227 |
| Comment by Gerrit Updater [ 04/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41223/ |
| Comment by Gerrit Updater [ 04/Mar/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41224/ |
| Comment by Gerrit Updater [ 25/Oct/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/41227/ |