[LU-13474] Lustre failover fails when SRPC enabled Created: 22/Apr/20 Updated: 16/Dec/20 Resolved: 03/Dec/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0, Lustre 2.12.4 |
| Fix Version/s: | Lustre 2.14.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Sebastien Buisson | Assignee: | Sebastien Buisson |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | gss, patch | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
When srpc rules are enabled, either for Kerberos or SSK, Lustre HA failover is broken. That is to say, clients might not able to reconnect to targets that have been stopped and then restarted, whether on initial or pair node. |
| Comments |
| Comment by Sebastien Buisson [ 22/Apr/20 ] |
|
The problem stems from the fact that clients are not able to try to connect to other service nodes for authentication requests. This is because the servers that receive an authentication request for a target that is not available for connect would simply drop the request, instead of returning an error to the client. |
| Comment by Gerrit Updater [ 22/Apr/20 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/38310 |
| Comment by Sebastien Buisson [ 27/Apr/20 ] |
|
The first version of the patch was not working in all situations, for instance when a server has no running target. Indeed, in this case, the server is not able to return an error to the client. The second version of the patch implements a different solution. Now, the client is prevented from restarting GSS negotiation immediately if the RPC to the server timed out. It will let the HA failover mechanism try different service nodes. |
| Comment by Gerrit Updater [ 03/Dec/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38310/ |
| Comment by Peter Jones [ 03/Dec/20 ] |
|
Landed for 2.14 |
| Comment by Gerrit Updater [ 16/Dec/20 ] |
|
Sebastien Buisson (sbuisson@ddn.com) uploaded a new patch: https://review.whamcloud.com/40995 |