[LU-11472] LNet Health: Decrement health value on response timeout Created: 04/Oct/18 Updated: 02/Nov/18 Resolved: 02/Nov/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | Lustre 2.12.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Amir Shehata (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | lnet-health | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
When a response times out we want to decrement the health of the immediate next hop peer ni, so we don't use that interface if there are others available. |
| Comments |
| Comment by Amir Shehata (Inactive) [ 04/Oct/18 ] |
|
I have a patch which I'll commit shortly. However, although this is going to work for directly connected. The behavior might be an issue for routing. If the route is servicing multiple connected peer through the same route. If one of the final destinations has a problem and it doesn't respond then the router interface will be dinged. It'll be put on the recovery queue, and recover, but during that period of time the route will be down. |
| Comment by Gerrit Updater [ 05/Oct/18 ] |
|
Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33295 |
| Comment by Gerrit Updater [ 05/Oct/18 ] |
|
Amir Shehata (ashehata@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33308 |
| Comment by Gerrit Updater [ 02/Nov/18 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33308/ |
| Comment by Peter Jones [ 02/Nov/18 ] |
|
Landed for 2.12 |