[LU-17217] Allow server to control/deny client connections - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Servers should be able to selectively allow client connections based on policies defined by Lustre admins. Policies could be defined on a wide range of client properties, depending on what proves to be most useful.

Attachments

Issue Links

is related to

LU-15177 cs_update live batch update hung waiting for MDT recovery to complete

Open

LU-13078 mgs trigger umount of clients

Open

LU-17435 improved reliability in the face of intermittent network errors

Open

LU-14288 Enhance nodemap ranges to work better with IPv6

Resolved

LU-12515 Provide an interface to set OST/client into readonly mode

Resolved

is related to

LU-17431 dynamically configurable nodemap

Open

(1 is related to )

Activity

[LU-17217] Allow server to control/deny client connections

Sebastien Buisson added a comment - 20/Mar/24 7:08 AM

I have started working on LU-17431. It seems feasible to add nodemaps dynamically on MDS/OSS, the trickiest part being the hierarchical organization of these nodemaps.

Sebastien Buisson added a comment - 20/Mar/24 7:08 AM I have started working on LU-17431 . It seems feasible to add nodemaps dynamically on MDS/OSS, the trickiest part being the hierarchical organization of these nodemaps.

Tim Day added a comment - 20/Mar/24 1:10 AM

The dynamic nodemap extension seems like a reasonable compromise. It doesn't seem too complex, implementation-wise - although I haven't looked into it deeply.

Tim Day added a comment - 20/Mar/24 1:10 AM The dynamic nodemap extension seems like a reasonable compromise. It doesn't seem too complex, implementation-wise - although I haven't looked into it deeply.

Sebastien Buisson added a comment - 06/Mar/24 7:59 AM

Good point Andreas. There is certainly a need for a more flexible or lightweight mechanism than what nodemap can offer today. And I think the dynamic nodemap extension as discussed in LU-17431 would be a good fit for this control/deny client connections use case. To me it looks like both features could be implemented in parallel. As the objective of dynamic nodemaps is to provide an alternative interface for setting nodemaps, all enhancements made to nodemaps would be available for dynamic nodemaps.

Sebastien Buisson added a comment - 06/Mar/24 7:59 AM Good point Andreas. There is certainly a need for a more flexible or lightweight mechanism than what nodemap can offer today. And I think the dynamic nodemap extension as discussed in LU-17431 would be a good fit for this control/deny client connections use case. To me it looks like both features could be implemented in parallel. As the objective of dynamic nodemaps is to provide an alternative interface for setting nodemaps, all enhancements made to nodemaps would be available for dynamic nodemaps.

Andreas Dilger added a comment - 06/Mar/24 1:45 AM

Sebastien, I was thinking about this a bit, and one reason to not use nodemaps for this is because nodemaps are relatively heavyweight to set up, and (IMHO) it would be useful to have a simple mechanism to allow blocking clients from accessing the servers. For example, if the servers are going into maintenance, or experiencing a problem due to some broken client application workload, it should be possible to quickly set a "deny all/some mounts" policy and evict clients, without having this be part of the persistent configuration that later has to be removed.

If there was a way to configure a nodemap temporarily like "lctl set_param ..." (vs. "lctl set_param -P") then that would be useful for this and likely other reasons we have discussed previously in LU-17431. We likely also need to have some way to configure (at least) a wildcard match for IPv6 addresses in a nodemap so that it could apply to all IPv6 nodes, which I've added to ~~LU-14288~~.

Andreas Dilger added a comment - 06/Mar/24 1:45 AM Sebastien, I was thinking about this a bit, and one reason to not use nodemaps for this is because nodemaps are relatively heavyweight to set up, and (IMHO) it would be useful to have a simple mechanism to allow blocking clients from accessing the servers. For example, if the servers are going into maintenance, or experiencing a problem due to some broken client application workload, it should be possible to quickly set a "deny all/some mounts" policy and evict clients, without having this be part of the persistent configuration that later has to be removed. If there was a way to configure a nodemap temporarily like " lctl set_param ... " (vs. " lctl set_param -P ") then that would be useful for this and likely other reasons we have discussed previously in LU-17431 . We likely also need to have some way to configure (at least) a wildcard match for IPv6 addresses in a nodemap so that it could apply to all IPv6 nodes, which I've added to LU-14288 .

Gerrit Updater added a comment - 04/Mar/24 8:02 PM

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52793/
Subject: LU-17217 obd: reserve server-side connection policy bits
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 55a65ed8853ba73f1f9de297e9b6e15a6f37743a

Gerrit Updater added a comment - 04/Mar/24 8:02 PM "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/52793/ Subject: LU-17217 obd: reserve server-side connection policy bits Project: fs/lustre-release Branch: master Current Patch Set: Commit: 55a65ed8853ba73f1f9de297e9b6e15a6f37743a

Tim Day added a comment - 19/Feb/24 4:20 PM

If a policy is set to SOFT_BLOCK, the client would be able to override by passing a mount flag (or something like that). The intended experience is like: client attempts to connect and is blocked with reason code -> user of client decides whether to proceed with mount with new information -> attempt connection again.

If the policy is set to HARD_BLOCK, perhaps we could hid the reason code. The client wouldn't be able to do anything. However, it'd still be nice to have a advisory message. Perhaps the advisory reason code could just be off by default.

Tim Day added a comment - 19/Feb/24 4:20 PM If a policy is set to SOFT_BLOCK, the client would be able to override by passing a mount flag (or something like that). The intended experience is like: client attempts to connect and is blocked with reason code -> user of client decides whether to proceed with mount with new information -> attempt connection again. If the policy is set to HARD_BLOCK, perhaps we could hid the reason code. The client wouldn't be able to do anything. However, it'd still be nice to have a advisory message. Perhaps the advisory reason code could just be off by default.

Sebastien Buisson added a comment - 19/Feb/24 4:04 PM

The requirement about the ability for servers to provide a reason for blocking back to the client is interesting. Usually from a security perspective it is better to avoid explaining to clients the real reasons for a rejected access from the server. Not that server side is considered a black box, but because there is not really anything that the client can do. It is not like a badly formed IO that the application can fix for instance.

Sebastien Buisson added a comment - 19/Feb/24 4:04 PM The requirement about the ability for servers to provide a reason for blocking back to the client is interesting. Usually from a security perspective it is better to avoid explaining to clients the real reasons for a rejected access from the server. Not that server side is considered a black box, but because there is not really anything that the client can do. It is not like a badly formed IO that the application can fix for instance.

Tim Day added a comment - 19/Feb/24 3:47 PM - edited

On the client-side, I think the important features (to me) are:

SOFT_BLOCK functionality described above
Ability for server to provide a reason for blocking back to the client

I think these would require some protocol change to work (specifically, those highlighted in the patch). In the case of a block, the reason code can be put in a unused part of the reply. So we should only need a few bit flags.

On the server side, we need to be able to set policies without knowing the NID (or much else about the client) ahead of time. In the PoC I have, this is done using custom commands and glob patterns. I could refactor it to either function on the default nodemap or add NID wildcards to nodemap. Then we could use nodemap modify commands with rbac or something.

I'm fine with refactoring all of the server-side stuff to fit more into nodemap. But I think the wire changes are needed regardless.

Tim Day added a comment - 19/Feb/24 3:47 PM - edited On the client-side, I think the important features (to me) are: SOFT_BLOCK functionality described above Ability for server to provide a reason for blocking back to the client I think these would require some protocol change to work (specifically, those highlighted in the patch). In the case of a block, the reason code can be put in a unused part of the reply. So we should only need a few bit flags. On the server side, we need to be able to set policies without knowing the NID (or much else about the client) ahead of time. In the PoC I have, this is done using custom commands and glob patterns. I could refactor it to either function on the default nodemap or add NID wildcards to nodemap. Then we could use nodemap modify commands with rbac or something. I'm fine with refactoring all of the server-side stuff to fit more into nodemap. But I think the wire changes are needed regardless.

Sebastien Buisson added a comment - 19/Feb/24 8:18 AM

To me, this looks redundant in many aspects with what nodemap could offer, with the necessary extensions. The ability to identify clients based on their NIDs, the possibility to assign different properties to these clients, and then adapt server behavior based on these properties.
I really think we should not develop a whole new thing that is an equivalent of nodemap, but rather extend the nodemap capabilities to fit your needs. For instance the 'rbac' property of nodemap looks well suited for what you want, you would probably just have to introduce new values for this property.

Sebastien Buisson added a comment - 19/Feb/24 8:18 AM To me, this looks redundant in many aspects with what nodemap could offer, with the necessary extensions. The ability to identify clients based on their NIDs, the possibility to assign different properties to these clients, and then adapt server behavior based on these properties. I really think we should not develop a whole new thing that is an equivalent of nodemap, but rather extend the nodemap capabilities to fit your needs. For instance the 'rbac' property of nodemap looks well suited for what you want, you would probably just have to introduce new values for this property.

Andreas Dilger added a comment - 18/Jan/24 6:38 PM

Isn't the UUID decided by the client at mount time? How would the server know ahead of time what the UUID is? Perhaps I'm misunderstanding how that works.

yes, but it is constant across the lifetime of that client mount. So if a client mountpoint is suffering heartburn for some reason (LBUG while holding a DLM lock or mutex, whatever) then that UUID can be banned until it remounts. we don't necessarily need to ban the NID, since a reboot will fix the problem.

Andreas Dilger added a comment - 18/Jan/24 6:38 PM Isn't the UUID decided by the client at mount time? How would the server know ahead of time what the UUID is? Perhaps I'm misunderstanding how that works. yes, but it is constant across the lifetime of that client mount. So if a client mountpoint is suffering heartburn for some reason (LBUG while holding a DLM lock or mutex, whatever) then that UUID can be banned until it remounts. we don't necessarily need to ban the NID, since a reboot will fix the problem.

Tim Day added a comment - 18/Jan/24 5:57 PM

I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all).

I don't think have a free-form message it too important personally. I think we should do only preset messages/codes unless someone has a strong argument for a wildcard/custom reason code.

For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN, 5-9 reconnects/resends per hour = SOFT_BLOCK, >= 10/hour = HARD_BLOCK), but that should be decided in the context of that ticket.

Are you suggesting a dynamic policy based on number of connection attempts? I could see that being useful. However, it would make defining the policies more complex. We'd probably have to use YAML rather than simple one-liners.

This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added.

Agreed. When a client connects, we should scan all policies and apply the most strict one that matches. We can require each policy to associated with a reason code. That way, an admin can have multiple policies for each client property (if they want).

While the UUIDs would only be good for the duration of the mount instance, they would allow selectively denying one mountpoint on a client instead of all mountpoints.

Isn't the UUID decided by the client at mount time? How would the server know ahead of time what the UUID is? Perhaps I'm misunderstanding how that works.

I'm going to do a second revision of this PoC soon (I have other things I need to focus on first). I just want to make sure the design is more-or-less sketched out.

Tim Day added a comment - 18/Jan/24 5:57 PM I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all). I don't think have a free-form message it too important personally. I think we should do only preset messages/codes unless someone has a strong argument for a wildcard/custom reason code. For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN , 5-9 reconnects/resends per hour = SOFT_BLOCK , >= 10/hour = HARD_BLOCK ), but that should be decided in the context of that ticket. Are you suggesting a dynamic policy based on number of connection attempts? I could see that being useful. However, it would make defining the policies more complex. We'd probably have to use YAML rather than simple one-liners. This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added. Agreed. When a client connects, we should scan all policies and apply the most strict one that matches. We can require each policy to associated with a reason code. That way, an admin can have multiple policies for each client property (if they want). While the UUIDs would only be good for the duration of the mount instance, they would allow selectively denying one mountpoint on a client instead of all mountpoints. Isn't the UUID decided by the client at mount time? How would the server know ahead of time what the UUID is? Perhaps I'm misunderstanding how that works. I'm going to do a second revision of this PoC soon (I have other things I need to focus on first). I just want to make sure the design is more-or-less sketched out.

People

Assignee:: Tim Day

Reporter:: Tim Day

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 22/Oct/23 9:05 PM

Updated:: 4 days ago 12:55 AM