[LU-5703] Quiesce client mountpoints from the server - Whamcloud Community JIRA

Details

Type: New Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.4.3
Labels:
- medium

Rank (Obsolete):
15964

Description

In order to minimize user disruptions NASA performs some system maintenance "Live". Typical maintance includes activities such as adding new compute node or reconfigurations of IB fabric. During such times users jobs are suspend via pbs. Although we are able to suspend user job, which does minimize usage of lustre, it does not stop all lustre client/server activity. Therefore NASA requires:
1. mechanism to halt and block all lustre client IO.
2. Halt client/server keep alive ping and all other network traffic.
3. Clients should be able to recover after the quiesce without eviction.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

SC09-Simplified-Interop.pdf
366 kB
13/Dec/19 11:37 PM
Dynamic Congestion Control - Qian.docx
482 kB
26/Nov/24 7:47 PM

Issue Links

is duplicated by

LU-13078 mgs trigger umount of clients

Open

is related to

LU-3290 disallow ptlrpc RPCs with old client XIDs

Open

LU-19076 resend can hit the original request on server

Open

LU-13521 WBC: special readdir() handling for root WBC directory

Open

LU-15250 RPC Replay Signature

Open

is related to

LU-13010 WBC: Reopen the file when WBC EX lock revoking

Open

LU-18 Allow 100k open files on single client

Resolved

LU-7236 OST connect and disconnect on demand

Resolved

mentioned in: Page Loading...

(3 is related to , 1 mentioned in)

Activity

[LU-5703] Quiesce client mountpoints from the server

Andreas Dilger added a comment - 26/Nov/24 7:50 PM

Related to Patrick's recent comment I've attached Dynamic Congestion Control - Qian.docx, which is a paper that Yingjin worked on long ago to have the servers dynamically manage the client max_rpcs_in_flight, in a manner similar to grants (i.e. constant flow of "RPC credits" out to clients that need them and withdrawing them from other clients that are not active). In addition to "stop all client RPCs" this could also be used to temporarily boost performance to busy clients.

Andreas Dilger added a comment - 26/Nov/24 7:50 PM Related to Patrick's recent comment I've attached Dynamic Congestion Control - Qian.docx , which is a paper that Yingjin worked on long ago to have the servers dynamically manage the client max_rpcs_in_flight , in a manner similar to grants (i.e. constant flow of "RPC credits" out to clients that need them and withdrawing them from other clients that are not active). In addition to "stop all client RPCs" this could also be used to temporarily boost performance to busy clients.

Patrick Farrell added a comment - 22/Nov/24 11:32 PM

I think this could probably be done fairly easily with some tweaks to the RPC code to allow the server to temporarily set max_rpcs_in_flight for data and metadata to zero. It wouldn't guarantee total silence - that would require unmounting - but it might be useful if folks didn't mind taking an additional performance stoppage. Interesting.

Patrick Farrell added a comment - 22/Nov/24 11:32 PM I think this could probably be done fairly easily with some tweaks to the RPC code to allow the server to temporarily set max_rpcs_in_flight for data and metadata to zero. It wouldn't guarantee total silence - that would require unmounting - but it might be useful if folks didn't mind taking an additional performance stoppage. Interesting.

Jeremy Enos (Inactive) added a comment - 27/Apr/18 2:37 AM

I think it's about client and configuration changes first... if server changes were possible too, that'd be outstanding. It is not about changing config or versions on clients in use by jobs while holding files open. The idea is to idle the clients in use by jobs so that others in a test pool could run regression tests after a config or client change, or regression tests on a periodic basis.

Getting consistent, and therefore meaningful, regression tests w/o a dedicated system is impossible otherwise.

Jeremy Enos (Inactive) added a comment - 27/Apr/18 2:37 AM I think it's about client and configuration changes first... if server changes were possible too, that'd be outstanding. It is not about changing config or versions on clients in use by jobs while holding files open. The idea is to idle the clients in use by jobs so that others in a test pool could run regression tests after a config or client change, or regression tests on a periodic basis. Getting consistent, and therefore meaningful, regression tests w/o a dedicated system is impossible otherwise.

Cory Spitz added a comment - 26/Apr/18 11:17 PM

Is this request for maintenance capabilities about server changes only? That is, the request isn't about keeping files open while the Lustre client is changed, correct?

Cory Spitz added a comment - 26/Apr/18 11:17 PM Is this request for maintenance capabilities about server changes only? That is, the request isn't about keeping files open while the Lustre client is changed, correct?

Jeremy Enos (Inactive) added a comment - 11/Nov/14 7:58 AM

At NCSA, there is a similar need although possibly lighter weight than the Simplified Interoperability feature described, and possibly just items 1 & 3 described by Mahmoud.
The specific application I have in mind at the moment is for confirmation benchmarking after an online configuration tuning. Ideally in this case, all clients would remain actively mounted (and pinging) with existing open files, but would suspend operations beyond that. A /proc control on the client would tell it whether or not to "suspend" or not, which then leaves the capability to have some clients active (presumably used to execute the regression test).
A search for this capability landed this ticket as a result- perhaps it's different enough that I should open a separate RFE?

Jeremy Enos (Inactive) added a comment - 11/Nov/14 7:58 AM At NCSA, there is a similar need although possibly lighter weight than the Simplified Interoperability feature described, and possibly just items 1 & 3 described by Mahmoud. The specific application I have in mind at the moment is for confirmation benchmarking after an online configuration tuning. Ideally in this case, all clients would remain actively mounted (and pinging) with existing open files, but would suspend operations beyond that. A /proc control on the client would tell it whether or not to "suspend" or not, which then leaves the capability to have some clients active (presumably used to execute the regression test). A search for this capability landed this ticket as a result- perhaps it's different enough that I should open a separate RFE?

Andreas Dilger added a comment - 02/Oct/14 8:19 PM

Cliff, I expect there may be problems relating to open files on the mounted filesystem that cannot be closed without killing the running application.

This request sounds a lot like a requirement we had for a feature called "Simplified Interoperability", which would flush all uncommitted RPCs from the clients and quiesce them in advance of a server shutdown for a Lustre software upgrade, so that we didn't have to manage recovery/replay of Lustre RPCs across different versions. This requires work on both the clients (to be able to "detach" themselves from their open files and "reattach" them once the server is restarted), and on the servers to notify the clients of the impending shutdown and to allow the clients to reconnect without evicting them.

Andreas Dilger added a comment - 02/Oct/14 8:19 PM Cliff, I expect there may be problems relating to open files on the mounted filesystem that cannot be closed without killing the running application. This request sounds a lot like a requirement we had for a feature called "Simplified Interoperability", which would flush all uncommitted RPCs from the clients and quiesce them in advance of a server shutdown for a Lustre software upgrade, so that we didn't have to manage recovery/replay of Lustre RPCs across different versions. This requires work on both the clients (to be able to "detach" themselves from their open files and "reattach" them once the server is restarted), and on the servers to notify the clients of the impending shutdown and to allow the clients to reconnect without evicting them.

Cliff White (Inactive) added a comment - 02/Oct/14 6:46 PM

The simplest way to do this would be to umount the clients, Then remount after your maintenance. That leaves the client in a good state, and you should be able to restart without issue.
If you are adding new compute nodes, (not servers) that should be completely transparent to all other clients, and should never require any quiescing of Lustre. Clients are very independent of one another.

If you are changing IB values, it would be best to umount all Lustre, unload the LNET modules and restart. That way you are certain your IB changes would propagate.

In most cases of Lustre 'live' maintenance, any live but idle Lustre machines should cause you no issues. There is no need for a 'quiesce' If you need to completely eliminate all Lustre traffic from your network, the quickest and safest way to do this is to simply stop Lustre on the affected nodes.

Cliff White (Inactive) added a comment - 02/Oct/14 6:46 PM The simplest way to do this would be to umount the clients, Then remount after your maintenance. That leaves the client in a good state, and you should be able to restart without issue. If you are adding new compute nodes, (not servers) that should be completely transparent to all other clients, and should never require any quiescing of Lustre. Clients are very independent of one another. If you are changing IB values, it would be best to umount all Lustre, unload the LNET modules and restart. That way you are certain your IB changes would propagate. In most cases of Lustre 'live' maintenance, any live but idle Lustre machines should cause you no issues. There is no need for a 'quiesce' If you need to completely eliminate all Lustre traffic from your network, the quickest and safest way to do this is to simply stop Lustre on the affected nodes.

People

Assignee:: Peter Jones

Reporter:: Mahmoud Hanafi

Votes:: 1 Vote for this issue

Watchers:: 16 Start watching this issue

Dates

Created:: 02/Oct/14 5:13 PM

Updated:: 10/Jun/25 5:35 AM