Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5703

Quiesce client mountpoints from the server

Details

    • New Feature
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.4.3
    • 15964

    Description

      In order to minimize user disruptions NASA performs some system maintenance "Live". Typical maintance includes activities such as adding new compute node or reconfigurations of IB fabric. During such times users jobs are suspend via pbs. Although we are able to suspend user job, which does minimize usage of lustre, it does not stop all lustre client/server activity. Therefore NASA requires:
      1. mechanism to halt and block all lustre client IO.
      2. Halt client/server keep alive ping and all other network traffic.
      3. Clients should be able to recover after the quiesce without eviction.

      Attachments

        Issue Links

          Activity

            [LU-5703] Quiesce client mountpoints from the server

            Related to Patrick's recent comment I've attached Dynamic Congestion Control - Qian.docx, which is a paper that Yingjin worked on long ago to have the servers dynamically manage the client max_rpcs_in_flight, in a manner similar to grants (i.e. constant flow of "RPC credits" out to clients that need them and withdrawing them from other clients that are not active). In addition to "stop all client RPCs" this could also be used to temporarily boost performance to busy clients.

            adilger Andreas Dilger added a comment - Related to Patrick's recent comment I've attached Dynamic Congestion Control - Qian.docx , which is a paper that Yingjin worked on long ago to have the servers dynamically manage the client max_rpcs_in_flight , in a manner similar to grants (i.e. constant flow of "RPC credits" out to clients that need them and withdrawing them from other clients that are not active). In addition to "stop all client RPCs" this could also be used to temporarily boost performance to busy clients.

            I think this could probably be done fairly easily with some tweaks to the RPC code to allow the server to temporarily set max_rpcs_in_flight for data and metadata to zero.  It wouldn't guarantee total silence - that would require unmounting - but it might be useful if folks didn't mind taking an additional performance stoppage.  Interesting.

            paf0186 Patrick Farrell added a comment - I think this could probably be done fairly easily with some tweaks to the RPC code to allow the server to temporarily set max_rpcs_in_flight for data and metadata to zero.  It wouldn't guarantee total silence - that would require unmounting - but it might be useful if folks didn't mind taking an additional performance stoppage.  Interesting.

            I think it's about client and configuration changes first... if server changes were possible too, that'd be outstanding. It is not about changing config or versions on clients in use by jobs while holding files open.  The idea is to idle the clients in use by jobs so that others in a test pool could run regression tests after a config or client change, or regression tests on a periodic basis.

            Getting consistent, and therefore meaningful, regression tests w/o a dedicated system is impossible otherwise.

            jeremyenos Jeremy Enos (Inactive) added a comment - I think it's about client and configuration changes first... if server changes were possible too, that'd be outstanding. It is not about changing config or versions on clients in use by jobs while holding files open.  The idea is to idle the clients in use by jobs so that others in a test pool could run regression tests after a config or client change, or regression tests on a periodic basis. Getting consistent, and therefore meaningful, regression tests w/o a dedicated system is impossible otherwise.
            spitzcor Cory Spitz added a comment -

            Is this request for maintenance capabilities about server changes only?  That is, the request isn't about keeping files open while the Lustre client is changed, correct?

            spitzcor Cory Spitz added a comment - Is this request for maintenance capabilities about server changes only?  That is, the request isn't about keeping files open while the Lustre client is changed, correct?

            At NCSA, there is a similar need although possibly lighter weight than the Simplified Interoperability feature described, and possibly just items 1 & 3 described by Mahmoud.
            The specific application I have in mind at the moment is for confirmation benchmarking after an online configuration tuning. Ideally in this case, all clients would remain actively mounted (and pinging) with existing open files, but would suspend operations beyond that. A /proc control on the client would tell it whether or not to "suspend" or not, which then leaves the capability to have some clients active (presumably used to execute the regression test).
            A search for this capability landed this ticket as a result- perhaps it's different enough that I should open a separate RFE?

            jeremyenos Jeremy Enos (Inactive) added a comment - At NCSA, there is a similar need although possibly lighter weight than the Simplified Interoperability feature described, and possibly just items 1 & 3 described by Mahmoud. The specific application I have in mind at the moment is for confirmation benchmarking after an online configuration tuning. Ideally in this case, all clients would remain actively mounted (and pinging) with existing open files, but would suspend operations beyond that. A /proc control on the client would tell it whether or not to "suspend" or not, which then leaves the capability to have some clients active (presumably used to execute the regression test). A search for this capability landed this ticket as a result- perhaps it's different enough that I should open a separate RFE?

            Cliff, I expect there may be problems relating to open files on the mounted filesystem that cannot be closed without killing the running application.

            This request sounds a lot like a requirement we had for a feature called "Simplified Interoperability", which would flush all uncommitted RPCs from the clients and quiesce them in advance of a server shutdown for a Lustre software upgrade, so that we didn't have to manage recovery/replay of Lustre RPCs across different versions. This requires work on both the clients (to be able to "detach" themselves from their open files and "reattach" them once the server is restarted), and on the servers to notify the clients of the impending shutdown and to allow the clients to reconnect without evicting them.

            adilger Andreas Dilger added a comment - Cliff, I expect there may be problems relating to open files on the mounted filesystem that cannot be closed without killing the running application. This request sounds a lot like a requirement we had for a feature called "Simplified Interoperability", which would flush all uncommitted RPCs from the clients and quiesce them in advance of a server shutdown for a Lustre software upgrade, so that we didn't have to manage recovery/replay of Lustre RPCs across different versions. This requires work on both the clients (to be able to "detach" themselves from their open files and "reattach" them once the server is restarted), and on the servers to notify the clients of the impending shutdown and to allow the clients to reconnect without evicting them.

            The simplest way to do this would be to umount the clients, Then remount after your maintenance. That leaves the client in a good state, and you should be able to restart without issue.
            If you are adding new compute nodes, (not servers) that should be completely transparent to all other clients, and should never require any quiescing of Lustre. Clients are very independent of one another.

            If you are changing IB values, it would be best to umount all Lustre, unload the LNET modules and restart. That way you are certain your IB changes would propagate.

            In most cases of Lustre 'live' maintenance, any live but idle Lustre machines should cause you no issues. There is no need for a 'quiesce' If you need to completely eliminate all Lustre traffic from your network, the quickest and safest way to do this is to simply stop Lustre on the affected nodes.

            cliffw Cliff White (Inactive) added a comment - The simplest way to do this would be to umount the clients, Then remount after your maintenance. That leaves the client in a good state, and you should be able to restart without issue. If you are adding new compute nodes, (not servers) that should be completely transparent to all other clients, and should never require any quiescing of Lustre. Clients are very independent of one another. If you are changing IB values, it would be best to umount all Lustre, unload the LNET modules and restart. That way you are certain your IB changes would propagate. In most cases of Lustre 'live' maintenance, any live but idle Lustre machines should cause you no issues. There is no need for a 'quiesce' If you need to completely eliminate all Lustre traffic from your network, the quickest and safest way to do this is to simply stop Lustre on the affected nodes.

            People

              pjones Peter Jones
              mhanafi Mahmoud Hanafi
              Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated: