Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17217

Allow server to control/deny client connections

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Servers should be able to selectively allow client connections based on policies defined by Lustre admins. Policies could be defined on a wide range of client properties, depending on what proves to be most useful.

      Attachments

        Issue Links

          Activity

            [LU-17217] Allow server to control/deny client connections
            timday Tim Day added a comment -

            By automatically adding clients to a "client_deny" nodemap, we can end up caching wrong information. For example, say we have a filesystem with version 2.17. We want to prevent 2.10 clients from connecting. Client 1.2.3.4@tcp with 2.10 attempts to connect to the filesystem and is automatically added to the "client_deny" nodemap. Client 1.2.3.4@tcp is then upgraded to 2.17 (either via a new package or a new root volume). This client should be allowed to connect to the filesystem, but will be blocked until it is manually removed from the "client_deny" nodemap. A client node can attempt to connect and fail, then immediately attempt to connect after changing OS, kernel, Lustre versions. This case has to be handled without Admin intervention.

            timday Tim Day added a comment - By automatically adding clients to a "client_deny" nodemap, we can end up caching wrong information. For example, say we have a filesystem with version 2.17. We want to prevent 2.10 clients from connecting. Client 1.2.3.4@tcp with 2.10 attempts to connect to the filesystem and is automatically added to the "client_deny" nodemap. Client 1.2.3.4@tcp is then upgraded to 2.17 (either via a new package or a new root volume). This client should be allowed to connect to the filesystem, but will be blocked until it is manually removed from the "client_deny" nodemap. A client node can attempt to connect and fail, then immediately attempt to connect after changing OS, kernel, Lustre versions. This case has to be handled without Admin intervention.

            I'm not sure what you consider about my proposal for the "client_deny" nodemap to be unsuitable? Definitely it would be possible to automatically add client to that nodemap, but it would also be a convenient holding place for client blocked by an admin manually. The important part of the idea is that this nodemap would always be created, like "normal" so that it is easy to use...

            adilger Andreas Dilger added a comment - I'm not sure what you consider about my proposal for the "client_deny" nodemap to be unsuitable? Definitely it would be possible to automatically add client to that nodemap, but it would also be a convenient holding place for client blocked by an admin manually. The important part of the idea is that this nodemap would always be created, like "normal" so that it is easy to use...
            timday Tim Day added a comment -

            I don't think that'd work for my use case. We don't know anything about the clients ahead of time, except what they tell us during the mount. I like the idea of being able to automatically block unhealthy clients. The ticket you linked for triggering umount from the server also seems useful.

             

            I still plan on reworking the patches I had earlier to work with nodemap. I'm targeting 2.17, but I don't think I'll have cycles until early next year.

            timday Tim Day added a comment - I don't think that'd work for my use case. We don't know anything about the clients ahead of time, except what they tell us during the mount. I like the idea of being able to automatically block unhealthy clients. The ticket you linked for triggering umount from the server also seems useful.   I still plan on reworking the patches I had earlier to work with nodemap. I'm targeting 2.17, but I don't think I'll have cycles until early next year.

            One option to handle this with minimal effort for the administrator would be to always create a nodemap like "banned_clients" or "client_blocked" or similar, and then allow the admin to easily add and remove clients from the "doghouse" nodemap. This could potentially be done automatically if some client has a flapping network link and keeps getting evicted from the server due to unresponsiveness (e.g. by a monitoring script, ptlrpc client eviction code, etc.).

            adilger Andreas Dilger added a comment - One option to handle this with minimal effort for the administrator would be to always create a nodemap like "banned_clients" or "client_blocked" or similar, and then allow the admin to easily add and remove clients from the "doghouse" nodemap. This could potentially be done automatically if some client has a flapping network link and keeps getting evicted from the server due to unresponsiveness (e.g. by a monitoring script, ptlrpc client eviction code, etc.).

            I have started working on LU-17431. It seems feasible to add nodemaps dynamically on MDS/OSS, the trickiest part being the hierarchical organization of these nodemaps.

            sebastien Sebastien Buisson added a comment - I have started working on LU-17431 . It seems feasible to add nodemaps dynamically on MDS/OSS, the trickiest part being the hierarchical organization of these nodemaps.
            timday Tim Day added a comment -

            The dynamic nodemap extension seems like a reasonable compromise. It doesn't seem too complex, implementation-wise - although I haven't looked into it deeply.

            timday Tim Day added a comment - The dynamic nodemap extension seems like a reasonable compromise. It doesn't seem too complex, implementation-wise - although I haven't looked into it deeply.

            Good point Andreas. There is certainly a need for a more flexible or lightweight mechanism than what nodemap can offer today. And I think the dynamic nodemap extension as discussed in LU-17431 would be a good fit for this control/deny client connections use case. To me it looks like both features could be implemented in parallel. As the objective of dynamic nodemaps is to provide an alternative interface for setting nodemaps, all enhancements made to nodemaps would be available for dynamic nodemaps.

            sebastien Sebastien Buisson added a comment - Good point Andreas. There is certainly a need for a more flexible or lightweight mechanism than what nodemap can offer today. And I think the dynamic nodemap extension as discussed in LU-17431 would be a good fit for this control/deny client connections use case. To me it looks like both features could be implemented in parallel. As the objective of dynamic nodemaps is to provide an alternative interface for setting nodemaps, all enhancements made to nodemaps would be available for dynamic nodemaps.

            Sebastien, I was thinking about this a bit, and one reason to not use nodemaps for this is because nodemaps are relatively heavyweight to set up, and (IMHO) it would be useful to have a simple mechanism to allow blocking clients from accessing the servers. For example, if the servers are going into maintenance, or experiencing a problem due to some broken client application workload, it should be possible to quickly set a "deny all/some mounts" policy and evict clients, without having this be part of the persistent configuration that later has to be removed.

            If there was a way to configure a nodemap temporarily like "lctl set_param ..." (vs. "lctl set_param -P") then that would be useful for this and likely other reasons we have discussed previously in LU-17431. We likely also need to have some way to configure (at least) a wildcard match for IPv6 addresses in a nodemap so that it could apply to all IPv6 nodes, which I've added to LU-14288.

            adilger Andreas Dilger added a comment - Sebastien, I was thinking about this a bit, and one reason to not use nodemaps for this is because nodemaps are relatively heavyweight to set up, and (IMHO) it would be useful to have a simple mechanism to allow blocking clients from accessing the servers. For example, if the servers are going into maintenance, or experiencing a problem due to some broken client application workload, it should be possible to quickly set a "deny all/some mounts" policy and evict clients, without having this be part of the persistent configuration that later has to be removed. If there was a way to configure a nodemap temporarily like " lctl set_param ... " (vs. " lctl set_param -P ") then that would be useful for this and likely other reasons we have discussed previously in LU-17431 . We likely also need to have some way to configure (at least) a wildcard match for IPv6 addresses in a nodemap so that it could apply to all IPv6 nodes, which I've added to LU-14288 .

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52793/
            Subject: LU-17217 obd: reserve server-side connection policy bits
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 55a65ed8853ba73f1f9de297e9b6e15a6f37743a

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52793/ Subject: LU-17217 obd: reserve server-side connection policy bits Project: fs/lustre-release Branch: master Current Patch Set: Commit: 55a65ed8853ba73f1f9de297e9b6e15a6f37743a
            timday Tim Day added a comment -

            If a policy is set to SOFT_BLOCK, the client would be able to override by passing a mount flag (or something like that). The intended experience is like: client attempts to connect and is blocked with reason code -> user of client decides whether to proceed with mount with new information -> attempt connection again.

             

            If the policy is set to HARD_BLOCK, perhaps we could hid the reason code. The client wouldn't be able to do anything. However, it'd still be nice to have a advisory message. Perhaps the advisory reason code could just be off by default.

            timday Tim Day added a comment - If a policy is set to SOFT_BLOCK, the client would be able to override by passing a mount flag (or something like that). The intended experience is like: client attempts to connect and is blocked with reason code -> user of client decides whether to proceed with mount with new information -> attempt connection again.   If the policy is set to HARD_BLOCK, perhaps we could hid the reason code. The client wouldn't be able to do anything. However, it'd still be nice to have a advisory message. Perhaps the advisory reason code could just be off by default.

            People

              timday Tim Day
              timday Tim Day
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: