[LU-4473] Disable LNET routes without disrupting ongoing filesystem operations Created: 10/Jan/14 Updated: 13/Jan/14 Resolved: 13/Jan/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Minor |
| Reporter: | Chris Horn | Assignee: | WC Triage |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | patch | ||
| Rank (Obsolete): | 12253 |
| Description |
|
It is desirable to be able to gracefully take an LNET router out of service without disrupting ongoing filesystem operations. Since not all RPCs are re-sent we need a way to prevent routes from being used for new traffic while existing buffered messages continue to drain. I have a patch implementing one approach to achieving this behavior. The patch creates a pair of lctl commands, down_interfaces and up_interfaces. The down_interfaces command, when executed on an LNET router, sets the ni->ni_status->ns_status of each lnet_ni_t in the global LND instance list (except for LOLND) to a new status introduced by this patch, LNET_NI_STATUS_ADMINDOWN. An admin would use this command to remove an LNET router node from service in the following way:
The up_interfaces command simply sets the ni->ni_status->ns_status of each lnet_ni_t in the global LND instance list (except for LOLND) to LNET_NI_STATUS_UP. |
| Comments |
| Comment by Chris Horn [ 10/Jan/14 ] |
|
For your consideration: |
| Comment by Chris Horn [ 10/Jan/14 ] |
|
One thing I forgot to mention is that the patch also modifies lnet_update_ni_status_locked() so that the router_checker will not mark "admindown" routes to "down". This is to prevent a situation where the router_checker might mark the the NI as "down" (which is fine in itself since this will also prevent new traffic) but then later get a response and want to mark the NI "up" which defeats the purpose of admindown status. |
| Comment by Amir Shehata (Inactive) [ 13/Jan/14 ] |
|
This functionality is being added as part of the Dynamic LNet Configuration (DLC) Project. The same feature you're requesting is being implemented in a slightly different way. Instead of bringing up and down the interface, routing is turned on and off. When routing is turned on all routing buffers are allocated, when routing is turned off the unused buffers are freed, and the in-use buffers are drained and then freed when they are no longer used. When clients ping a node which has routing turned off, the node responds with a flag that states that routing is turned off and the client then skips routes which use this router as a next-hop. This implies that both clients and servers must be the DLC build. However, in your description, you have: I'm not sure how that is done. can you please elaborate. below are the dlc patches |
| Comment by Chris Horn [ 13/Jan/14 ] |
|
Ah, this is good to know. I will abandon my patchset, and this ticket can be closed. "However, in your description, you have: I just meant that an admin could look at, for example, /proc/sys/lnet/buffers to see when all the credits are free. |
| Comment by Peter Jones [ 13/Jan/14 ] |
|
ok - thanks Chris! |