[LU-9238] Enhancement for route failure detection Created: 21/Mar/17 Updated: 01/May/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
I've been thinking about ways to enhance route failure detection since the asymmetric route failure detection doesn't do much for multi-hop configurations. The idea I had was to extend the lnet ping info to include route up/down status. This way peers could get route status of their next hop and use that information in selecting an appropriate next hop for future sends. Furthermore, in multi-hop configurations any bad hop on the route should eventually percolate to all peers that use that route. This isn't an ideal solution since it requires a wire protocol change, but I thought I would open this ticket to discuss further or maybe we can come up with another option. |
| Comments |
| Comment by Andreas Dilger [ 23/Mar/17 ] |
|
This may overlap with the LNet Multi-Rail and/or Dynamic Discovery work, as well as the proposed LNet Resiliency project. Amir, could you please comment when you have a chance. |