[LU-17217] Allow server to control/deny client connections - Whamcloud Community JIRA

Sebastien Buisson added a comment - 19/Feb/24 4:04 PM

The requirement about the ability for servers to provide a reason for blocking back to the client is interesting. Usually from a security perspective it is better to avoid explaining to clients the real reasons for a rejected access from the server. Not that server side is considered a black box, but because there is not really anything that the client can do. It is not like a badly formed IO that the application can fix for instance.

Sebastien Buisson added a comment - 19/Feb/24 4:04 PM The requirement about the ability for servers to provide a reason for blocking back to the client is interesting. Usually from a security perspective it is better to avoid explaining to clients the real reasons for a rejected access from the server. Not that server side is considered a black box, but because there is not really anything that the client can do. It is not like a badly formed IO that the application can fix for instance.

Tim Day added a comment - 19/Feb/24 3:47 PM - edited

On the client-side, I think the important features (to me) are:

SOFT_BLOCK functionality described above
Ability for server to provide a reason for blocking back to the client

I think these would require some protocol change to work (specifically, those highlighted in the patch). In the case of a block, the reason code can be put in a unused part of the reply. So we should only need a few bit flags.

On the server side, we need to be able to set policies without knowing the NID (or much else about the client) ahead of time. In the PoC I have, this is done using custom commands and glob patterns. I could refactor it to either function on the default nodemap or add NID wildcards to nodemap. Then we could use nodemap modify commands with rbac or something.

I'm fine with refactoring all of the server-side stuff to fit more into nodemap. But I think the wire changes are needed regardless.

Tim Day added a comment - 19/Feb/24 3:47 PM - edited On the client-side, I think the important features (to me) are: SOFT_BLOCK functionality described above Ability for server to provide a reason for blocking back to the client I think these would require some protocol change to work (specifically, those highlighted in the patch). In the case of a block, the reason code can be put in a unused part of the reply. So we should only need a few bit flags. On the server side, we need to be able to set policies without knowing the NID (or much else about the client) ahead of time. In the PoC I have, this is done using custom commands and glob patterns. I could refactor it to either function on the default nodemap or add NID wildcards to nodemap. Then we could use nodemap modify commands with rbac or something. I'm fine with refactoring all of the server-side stuff to fit more into nodemap. But I think the wire changes are needed regardless.

Sebastien Buisson added a comment - 19/Feb/24 8:18 AM

To me, this looks redundant in many aspects with what nodemap could offer, with the necessary extensions. The ability to identify clients based on their NIDs, the possibility to assign different properties to these clients, and then adapt server behavior based on these properties.
I really think we should not develop a whole new thing that is an equivalent of nodemap, but rather extend the nodemap capabilities to fit your needs. For instance the 'rbac' property of nodemap looks well suited for what you want, you would probably just have to introduce new values for this property.

Sebastien Buisson added a comment - 19/Feb/24 8:18 AM To me, this looks redundant in many aspects with what nodemap could offer, with the necessary extensions. The ability to identify clients based on their NIDs, the possibility to assign different properties to these clients, and then adapt server behavior based on these properties. I really think we should not develop a whole new thing that is an equivalent of nodemap, but rather extend the nodemap capabilities to fit your needs. For instance the 'rbac' property of nodemap looks well suited for what you want, you would probably just have to introduce new values for this property.

Andreas Dilger added a comment - 18/Jan/24 6:38 PM

Isn't the UUID decided by the client at mount time? How would the server know ahead of time what the UUID is? Perhaps I'm misunderstanding how that works.

yes, but it is constant across the lifetime of that client mount. So if a client mountpoint is suffering heartburn for some reason (LBUG while holding a DLM lock or mutex, whatever) then that UUID can be banned until it remounts. we don't necessarily need to ban the NID, since a reboot will fix the problem.

Andreas Dilger added a comment - 18/Jan/24 6:38 PM Isn't the UUID decided by the client at mount time? How would the server know ahead of time what the UUID is? Perhaps I'm misunderstanding how that works. yes, but it is constant across the lifetime of that client mount. So if a client mountpoint is suffering heartburn for some reason (LBUG while holding a DLM lock or mutex, whatever) then that UUID can be banned until it remounts. we don't necessarily need to ban the NID, since a reboot will fix the problem.

Tim Day added a comment - 18/Jan/24 5:57 PM

I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all).

I don't think have a free-form message it too important personally. I think we should do only preset messages/codes unless someone has a strong argument for a wildcard/custom reason code.

For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN, 5-9 reconnects/resends per hour = SOFT_BLOCK, >= 10/hour = HARD_BLOCK), but that should be decided in the context of that ticket.

Are you suggesting a dynamic policy based on number of connection attempts? I could see that being useful. However, it would make defining the policies more complex. We'd probably have to use YAML rather than simple one-liners.

This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added.

Agreed. When a client connects, we should scan all policies and apply the most strict one that matches. We can require each policy to associated with a reason code. That way, an admin can have multiple policies for each client property (if they want).

While the UUIDs would only be good for the duration of the mount instance, they would allow selectively denying one mountpoint on a client instead of all mountpoints.

Isn't the UUID decided by the client at mount time? How would the server know ahead of time what the UUID is? Perhaps I'm misunderstanding how that works.

I'm going to do a second revision of this PoC soon (I have other things I need to focus on first). I just want to make sure the design is more-or-less sketched out.

Tim Day added a comment - 18/Jan/24 5:57 PM I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all). I don't think have a free-form message it too important personally. I think we should do only preset messages/codes unless someone has a strong argument for a wildcard/custom reason code. For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN , 5-9 reconnects/resends per hour = SOFT_BLOCK , >= 10/hour = HARD_BLOCK ), but that should be decided in the context of that ticket. Are you suggesting a dynamic policy based on number of connection attempts? I could see that being useful. However, it would make defining the policies more complex. We'd probably have to use YAML rather than simple one-liners. This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added. Agreed. When a client connects, we should scan all policies and apply the most strict one that matches. We can require each policy to associated with a reason code. That way, an admin can have multiple policies for each client property (if they want). While the UUIDs would only be good for the duration of the mount instance, they would allow selectively denying one mountpoint on a client instead of all mountpoints. Isn't the UUID decided by the client at mount time? How would the server know ahead of time what the UUID is? Perhaps I'm misunderstanding how that works. I'm going to do a second revision of this PoC soon (I have other things I need to focus on first). I just want to make sure the design is more-or-less sketched out.

Andreas Dilger added a comment - 17/Jan/24 6:25 AM

It would be very useful if this could be integrated with a policy like "deny clients by NID" independent of nodemaps, with a reason like "2:client NID administratively denied on server" or similar. Similar to "mdt.*.evict_nids" and "obdfilter.*.evict_nids" it should be possible to write client NIDs and maybe also UUIDs to the parameter to permanently block these connection attempts. While the UUIDs would only be good for the duration of the mount instance, they would allow selectively denying one mountpoint on a client instead of all mountpoints.

Andreas Dilger added a comment - 17/Jan/24 6:25 AM It would be very useful if this could be integrated with a policy like "deny clients by NID" independent of nodemaps, with a reason like " 2:client NID administratively denied on server " or similar. Similar to " mdt.*.evict_nids " and " obdfilter.*.evict_nids " it should be possible to write client NIDs and maybe also UUIDs to the parameter to permanently block these connection attempts. While the UUIDs would only be good for the duration of the mount instance, they would allow selectively denying one mountpoint on a client instead of all mountpoints.

Andreas Dilger added a comment - 17/Jan/24 6:20 AM

I was just looking into this in the context of LU-17435, and being able to dynamically add client NIDs to the "deny list" if they have issues maintaining a stable connection to the server. This would be managed by LNet/ptlrpc/ldlm in the kernel, so having a specific "reason code" would make sense, like "14:client has unstable network interface" or whatever, and a simpler check like "1:client Lustre version is too old" would probably be the first one. I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all).

This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added. I think that would mitigate my concerns with the usefulness of SOFT_BLOCK. For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN, 5-9 reconnects/resends per hour = SOFT_BLOCK, >= 10/hour = HARD_BLOCK), but that should be decided in the context of that ticket.

Andreas Dilger added a comment - 17/Jan/24 6:20 AM I was just looking into this in the context of LU-17435 , and being able to dynamically add client NIDs to the "deny list" if they have issues maintaining a stable connection to the server. This would be managed by LNet/ptlrpc/ldlm in the kernel, so having a specific "reason code" would make sense, like " 14:client has unstable network interface " or whatever, and a simpler check like " 1:client Lustre version is too old " would probably be the first one. I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all). This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added. I think that would mitigate my concerns with the usefulness of SOFT_BLOCK . For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN , 5-9 reconnects/resends per hour = SOFT_BLOCK , >= 10/hour = HARD_BLOCK ), but that should be decided in the context of that ticket.

Tim Day added a comment - 24/Oct/23 2:30 AM

I'm thinking that the first time that SOFT_BLOCK is used (for whatever reason), the user would add the mount override option to the mount command line and then it would permanently ignore all future SOFT_BLOCK settings in the future.

Definitely. But I think that's fine. By then, the SOFT_BLOCK would have served it's social purpose. Plus, we can still have the client output some warning message. And the presence of the mount flag clearly indicates that the user has opted into an unusual (according to the admin) configuration.

Having a reason returned to the client would definitely be useful. If it was just the client version, then if the server returned a bogus higher version (e.g 215.3.0) and then the client would print a message like "Server testfs-MDT0000 version (215.3.0.0) is much newer than client. Consider upgrading client (2.12.9.0)".

I think the easiest way to do this: have the server return a code (defined by an enum) that the client can map to a nice message. I think that would need another u8 in obd_connect_data. So the total reservation would be: OBD_CONNECT flag, u8 bitmap (for SOFT_BLOCK, WARN flags), u8 for 256 server verdicts.

Tim Day added a comment - 24/Oct/23 2:30 AM I'm thinking that the first time that SOFT_BLOCK is used (for whatever reason), the user would add the mount override option to the mount command line and then it would permanently ignore all future SOFT_BLOCK settings in the future. Definitely. But I think that's fine. By then, the SOFT_BLOCK would have served it's social purpose. Plus, we can still have the client output some warning message. And the presence of the mount flag clearly indicates that the user has opted into an unusual (according to the admin) configuration. Having a reason returned to the client would definitely be useful. If it was just the client version, then if the server returned a bogus higher version (e.g 215.3.0) and then the client would print a message like " Server testfs-MDT0000 version (215.3.0.0) is much newer than client. Consider upgrading client (2.12.9.0) ". I think the easiest way to do this: have the server return a code (defined by an enum) that the client can map to a nice message. I think that would need another u8 in obd_connect_data. So the total reservation would be: OBD_CONNECT flag, u8 bitmap (for SOFT_BLOCK, WARN flags), u8 for 256 server verdicts.

Andreas Dilger added a comment - 24/Oct/23 2:00 AM

I'm thinking that the first time that SOFT_BLOCK is used (for whatever reason), the user would add the mount override option to the mount command line and then it would permanently ignore all future SOFT_BLOCK settings in the future.

Having a reason returned to the client would definitely be useful. If it was just the client version, then if the server returned a bogus higher version (e.g 215.3.0) and then the client would print a message like "Server testfs-MDT0000 version (215.3.0.0) is much newer than client. Consider upgrading client (2.12.9.0)".

Andreas Dilger added a comment - 24/Oct/23 2:00 AM I'm thinking that the first time that SOFT_BLOCK is used (for whatever reason), the user would add the mount override option to the mount command line and then it would permanently ignore all future SOFT_BLOCK settings in the future. Having a reason returned to the client would definitely be useful. If it was just the client version, then if the server returned a bogus higher version (e.g 215.3.0) and then the client would print a message like " Server testfs-MDT0000 version (215.3.0.0) is much newer than client. Consider upgrading client (2.12.9.0) ".

Tim Day added a comment - 24/Oct/23 12:06 AM

Are you familiar with nodemap? This seems to have some overlap with functionality there?

Yeah. I looked at extending nodemap, but I went with a simplified interface for the PoC. I should look more closely at nodemap. Initially, I'm just trying to figure out what the wire protocol is going to look like.

pre-discussion of implementation approaches

I don't mind putting together PoCs like this. It helps me organize my thinking (and learn more about Lustre internals ).

The issue I have with "criterion decided by admins" and soft blocking is that the client has no idea what those criteria are, nor do I.

I would think that the clients will always want to try and connect anyway, so it isn't clear how this would be used in practice.

If this is only related to client/server versions, there is already a mechanism to handle that which could be made a tunable parameter instead of a compile-time constant.

The four policies I have envisioned so far are ALLOW, WARN, SOFT_BLOCK, and HARD_BLOCK. Two of them, specifically ALLOW and HARD_BLOCK, I think have a clear technical purpose - selectively blocking only some clients.

WARN and SOFT_BLOCK have more of a social purpose. WARN would cause the client to emit some kind of message like "the admin might block this client in the future / doesn't recommend using this client". Of course, warning messages are pretty easy to ignore. SOFT_BLOCK essentially forces the user mounting the file system to acknowledge "the admin really means it, this client might be blocked / don't use it". You can silence that by passing the mount flag. But at that point, it would be clear to the user that they are doing something the admin would prefer they didn't. It can avoid (in theory) forcing the admin to talk to everyone individually.

An analogy would be hotel WiFi: before your phone/laptop can connect, you have to go into the hotel's web portal and acknowledge you are going to use the WiFi properly. You can still connect - but the hotel can make you read stuff.

It might not be clear to the user why the client is being blocked. There might be a way for the server to pass back a 'verdict' and message. I haven't looked into that yet, but that would be an improvement. Then the client could say "Permission denied - Reason VERSION - Message 'Don't use 2.6 clients'". But I don't think that's strictly necessary for the first implementation.

Tim Day added a comment - 24/Oct/23 12:06 AM Are you familiar with nodemap? This seems to have some overlap with functionality there? Yeah. I looked at extending nodemap, but I went with a simplified interface for the PoC. I should look more closely at nodemap. Initially, I'm just trying to figure out what the wire protocol is going to look like. pre-discussion of implementation approaches I don't mind putting together PoCs like this. It helps me organize my thinking (and learn more about Lustre internals ). The issue I have with "criterion decided by admins" and soft blocking is that the client has no idea what those criteria are, nor do I. I would think that the clients will always want to try and connect anyway, so it isn't clear how this would be used in practice. If this is only related to client/server versions, there is already a mechanism to handle that which could be made a tunable parameter instead of a compile-time constant. The four policies I have envisioned so far are ALLOW, WARN, SOFT_BLOCK, and HARD_BLOCK. Two of them, specifically ALLOW and HARD_BLOCK, I think have a clear technical purpose - selectively blocking only some clients. WARN and SOFT_BLOCK have more of a social purpose. WARN would cause the client to emit some kind of message like "the admin might block this client in the future / doesn't recommend using this client". Of course, warning messages are pretty easy to ignore. SOFT_BLOCK essentially forces the user mounting the file system to acknowledge "the admin really means it, this client might be blocked / don't use it". You can silence that by passing the mount flag. But at that point, it would be clear to the user that they are doing something the admin would prefer they didn't. It can avoid (in theory) forcing the admin to talk to everyone individually. An analogy would be hotel WiFi: before your phone/laptop can connect, you have to go into the hotel's web portal and acknowledge you are going to use the WiFi properly. You can still connect - but the hotel can make you read stuff. It might not be clear to the user why the client is being blocked. There might be a way for the server to pass back a 'verdict' and message. I haven't looked into that yet, but that would be an improvement. Then the client could say "Permission denied - Reason VERSION - Message 'Don't use 2.6 clients'". But I don't think that's strictly necessary for the first implementation.

Patrick Farrell added a comment - 23/Oct/23 3:55 PM

Tim,

Are you familiar with nodemap? This seems to have some overlap with functionality there?

By the way, also - Obviously I'm not sure about the implementation details yet, but we're generally happy to have some pre-discussion of implementation approaches for anyone planning to do a big chunk of work. You can open a JIRA suggesting something and we're happy to discuss before you do the implementation. (We may pull the LKML trick of "eh implement it and let's see", but we also often will give real feedback to early proposals. )

Patrick Farrell added a comment - 23/Oct/23 3:55 PM Tim, Are you familiar with nodemap? This seems to have some overlap with functionality there? By the way, also - Obviously I'm not sure about the implementation details yet, but we're generally happy to have some pre-discussion of implementation approaches for anyone planning to do a big chunk of work. You can open a JIRA suggesting something and we're happy to discuss before you do the implementation. (We may pull the LKML trick of "eh implement it and let's see", but we also often will give real feedback to early proposals. )

Allow server to control/deny client connections

Details

Description

Attachments

Issue Links

Activity

People

Dates