[LU-17217] Allow server to control/deny client connections - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Servers should be able to selectively allow client connections based on policies defined by Lustre admins. Policies could be defined on a wide range of client properties, depending on what proves to be most useful.

Attachments

Issue Links

is related to

LU-15177 cs_update live batch update hung waiting for MDT recovery to complete

Open

LU-13078 mgs trigger umount of clients

Open

LU-17435 improved reliability in the face of intermittent network errors

Open

LU-14288 Enhance nodemap ranges to work better with IPv6

Reopened

LU-12515 Provide an interface to set OST/client into readonly mode

Resolved

is related to

LU-17431 dynamically configurable nodemap

Open

(1 is related to )

Activity

[LU-17217] Allow server to control/deny client connections

Andreas Dilger added a comment - 17/Jan/24 6:20 AM

I was just looking into this in the context of LU-17435, and being able to dynamically add client NIDs to the "deny list" if they have issues maintaining a stable connection to the server. This would be managed by LNet/ptlrpc/ldlm in the kernel, so having a specific "reason code" would make sense, like "14:client has unstable network interface" or whatever, and a simpler check like "1:client Lustre version is too old" would probably be the first one. I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all).

This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added. I think that would mitigate my concerns with the usefulness of SOFT_BLOCK. For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN, 5-9 reconnects/resends per hour = SOFT_BLOCK, >= 10/hour = HARD_BLOCK), but that should be decided in the context of that ticket.

Andreas Dilger added a comment - 17/Jan/24 6:20 AM I was just looking into this in the context of LU-17435 , and being able to dynamically add client NIDs to the "deny list" if they have issues maintaining a stable connection to the server. This would be managed by LNet/ptlrpc/ldlm in the kernel, so having a specific "reason code" would make sense, like " 14:client has unstable network interface " or whatever, and a simpler check like " 1:client Lustre version is too old " would probably be the first one. I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all). This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added. I think that would mitigate my concerns with the usefulness of SOFT_BLOCK . For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN , 5-9 reconnects/resends per hour = SOFT_BLOCK , >= 10/hour = HARD_BLOCK ), but that should be decided in the context of that ticket.

Tim Day added a comment - 24/Oct/23 2:30 AM

I'm thinking that the first time that SOFT_BLOCK is used (for whatever reason), the user would add the mount override option to the mount command line and then it would permanently ignore all future SOFT_BLOCK settings in the future.

Definitely. But I think that's fine. By then, the SOFT_BLOCK would have served it's social purpose. Plus, we can still have the client output some warning message. And the presence of the mount flag clearly indicates that the user has opted into an unusual (according to the admin) configuration.

Having a reason returned to the client would definitely be useful. If it was just the client version, then if the server returned a bogus higher version (e.g 215.3.0) and then the client would print a message like "Server testfs-MDT0000 version (215.3.0.0) is much newer than client. Consider upgrading client (2.12.9.0)".

I think the easiest way to do this: have the server return a code (defined by an enum) that the client can map to a nice message. I think that would need another u8 in obd_connect_data. So the total reservation would be: OBD_CONNECT flag, u8 bitmap (for SOFT_BLOCK, WARN flags), u8 for 256 server verdicts.

Tim Day added a comment - 24/Oct/23 2:30 AM I'm thinking that the first time that SOFT_BLOCK is used (for whatever reason), the user would add the mount override option to the mount command line and then it would permanently ignore all future SOFT_BLOCK settings in the future. Definitely. But I think that's fine. By then, the SOFT_BLOCK would have served it's social purpose. Plus, we can still have the client output some warning message. And the presence of the mount flag clearly indicates that the user has opted into an unusual (according to the admin) configuration. Having a reason returned to the client would definitely be useful. If it was just the client version, then if the server returned a bogus higher version (e.g 215.3.0) and then the client would print a message like " Server testfs-MDT0000 version (215.3.0.0) is much newer than client. Consider upgrading client (2.12.9.0) ". I think the easiest way to do this: have the server return a code (defined by an enum) that the client can map to a nice message. I think that would need another u8 in obd_connect_data. So the total reservation would be: OBD_CONNECT flag, u8 bitmap (for SOFT_BLOCK, WARN flags), u8 for 256 server verdicts.

Andreas Dilger added a comment - 24/Oct/23 2:00 AM

I'm thinking that the first time that SOFT_BLOCK is used (for whatever reason), the user would add the mount override option to the mount command line and then it would permanently ignore all future SOFT_BLOCK settings in the future.

Having a reason returned to the client would definitely be useful. If it was just the client version, then if the server returned a bogus higher version (e.g 215.3.0) and then the client would print a message like "Server testfs-MDT0000 version (215.3.0.0) is much newer than client. Consider upgrading client (2.12.9.0)".

Andreas Dilger added a comment - 24/Oct/23 2:00 AM I'm thinking that the first time that SOFT_BLOCK is used (for whatever reason), the user would add the mount override option to the mount command line and then it would permanently ignore all future SOFT_BLOCK settings in the future. Having a reason returned to the client would definitely be useful. If it was just the client version, then if the server returned a bogus higher version (e.g 215.3.0) and then the client would print a message like " Server testfs-MDT0000 version (215.3.0.0) is much newer than client. Consider upgrading client (2.12.9.0) ".

Tim Day added a comment - 24/Oct/23 12:06 AM

Are you familiar with nodemap? This seems to have some overlap with functionality there?

Yeah. I looked at extending nodemap, but I went with a simplified interface for the PoC. I should look more closely at nodemap. Initially, I'm just trying to figure out what the wire protocol is going to look like.

pre-discussion of implementation approaches

I don't mind putting together PoCs like this. It helps me organize my thinking (and learn more about Lustre internals ).

The issue I have with "criterion decided by admins" and soft blocking is that the client has no idea what those criteria are, nor do I.

I would think that the clients will always want to try and connect anyway, so it isn't clear how this would be used in practice.

If this is only related to client/server versions, there is already a mechanism to handle that which could be made a tunable parameter instead of a compile-time constant.

The four policies I have envisioned so far are ALLOW, WARN, SOFT_BLOCK, and HARD_BLOCK. Two of them, specifically ALLOW and HARD_BLOCK, I think have a clear technical purpose - selectively blocking only some clients.

WARN and SOFT_BLOCK have more of a social purpose. WARN would cause the client to emit some kind of message like "the admin might block this client in the future / doesn't recommend using this client". Of course, warning messages are pretty easy to ignore. SOFT_BLOCK essentially forces the user mounting the file system to acknowledge "the admin really means it, this client might be blocked / don't use it". You can silence that by passing the mount flag. But at that point, it would be clear to the user that they are doing something the admin would prefer they didn't. It can avoid (in theory) forcing the admin to talk to everyone individually.

An analogy would be hotel WiFi: before your phone/laptop can connect, you have to go into the hotel's web portal and acknowledge you are going to use the WiFi properly. You can still connect - but the hotel can make you read stuff.

It might not be clear to the user why the client is being blocked. There might be a way for the server to pass back a 'verdict' and message. I haven't looked into that yet, but that would be an improvement. Then the client could say "Permission denied - Reason VERSION - Message 'Don't use 2.6 clients'". But I don't think that's strictly necessary for the first implementation.

Tim Day added a comment - 24/Oct/23 12:06 AM Are you familiar with nodemap? This seems to have some overlap with functionality there? Yeah. I looked at extending nodemap, but I went with a simplified interface for the PoC. I should look more closely at nodemap. Initially, I'm just trying to figure out what the wire protocol is going to look like. pre-discussion of implementation approaches I don't mind putting together PoCs like this. It helps me organize my thinking (and learn more about Lustre internals ). The issue I have with "criterion decided by admins" and soft blocking is that the client has no idea what those criteria are, nor do I. I would think that the clients will always want to try and connect anyway, so it isn't clear how this would be used in practice. If this is only related to client/server versions, there is already a mechanism to handle that which could be made a tunable parameter instead of a compile-time constant. The four policies I have envisioned so far are ALLOW, WARN, SOFT_BLOCK, and HARD_BLOCK. Two of them, specifically ALLOW and HARD_BLOCK, I think have a clear technical purpose - selectively blocking only some clients. WARN and SOFT_BLOCK have more of a social purpose. WARN would cause the client to emit some kind of message like "the admin might block this client in the future / doesn't recommend using this client". Of course, warning messages are pretty easy to ignore. SOFT_BLOCK essentially forces the user mounting the file system to acknowledge "the admin really means it, this client might be blocked / don't use it". You can silence that by passing the mount flag. But at that point, it would be clear to the user that they are doing something the admin would prefer they didn't. It can avoid (in theory) forcing the admin to talk to everyone individually. An analogy would be hotel WiFi: before your phone/laptop can connect, you have to go into the hotel's web portal and acknowledge you are going to use the WiFi properly. You can still connect - but the hotel can make you read stuff. It might not be clear to the user why the client is being blocked. There might be a way for the server to pass back a 'verdict' and message. I haven't looked into that yet, but that would be an improvement. Then the client could say "Permission denied - Reason VERSION - Message 'Don't use 2.6 clients'". But I don't think that's strictly necessary for the first implementation.

Patrick Farrell added a comment - 23/Oct/23 3:55 PM

Tim,

Are you familiar with nodemap? This seems to have some overlap with functionality there?

By the way, also - Obviously I'm not sure about the implementation details yet, but we're generally happy to have some pre-discussion of implementation approaches for anyone planning to do a big chunk of work. You can open a JIRA suggesting something and we're happy to discuss before you do the implementation. (We may pull the LKML trick of "eh implement it and let's see", but we also often will give real feedback to early proposals. )

Patrick Farrell added a comment - 23/Oct/23 3:55 PM Tim, Are you familiar with nodemap? This seems to have some overlap with functionality there? By the way, also - Obviously I'm not sure about the implementation details yet, but we're generally happy to have some pre-discussion of implementation approaches for anyone planning to do a big chunk of work. You can open a JIRA suggesting something and we're happy to discuss before you do the implementation. (We may pull the LKML trick of "eh implement it and let's see", but we also often will give real feedback to early proposals. )

Andreas Dilger added a comment - 23/Oct/23 11:11 AM

I think it would be better to discuss the use cases and implementation of this feature here, rather than in Gerrit, which is terrible for post-facto review of discussion about a patch.

Andreas Dilger added a comment - 23/Oct/23 11:11 AM I think it would be better to discuss the use cases and implementation of this feature here, rather than in Gerrit, which is terrible for post-facto review of discussion about a patch.

Gerrit Updater added a comment - 22/Oct/23 9:30 PM

"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52794
Subject: LU-17217 ptlrpc: service-side connection policy
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 33668a6d418a2970a8977f90c9b804c632920c91

Gerrit Updater added a comment - 22/Oct/23 9:30 PM "Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52794 Subject: LU-17217 ptlrpc: service-side connection policy Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 33668a6d418a2970a8977f90c9b804c632920c91

Gerrit Updater added a comment - 22/Oct/23 9:30 PM

"Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52793
Subject: LU-17217 obd: reserve server-side connection policy bits
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 17f44114e60aaf0fffb99f7535402dbd69949d4f

Gerrit Updater added a comment - 22/Oct/23 9:30 PM "Timothy Day <timday@amazon.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/52793 Subject: LU-17217 obd: reserve server-side connection policy bits Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 17f44114e60aaf0fffb99f7535402dbd69949d4f

People

Assignee:: Tim Day

Reporter:: Tim Day

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 22/Oct/23 9:05 PM

Updated:: 07/Jul/25 10:45 PM