Are you familiar with nodemap? This seems to have some overlap with functionality there?
Yeah. I looked at extending nodemap, but I went with a simplified interface for the PoC. I should look more closely at nodemap. Initially, I'm just trying to figure out what the wire protocol is going to look like.
pre-discussion of implementation approaches
I don't mind putting together PoCs like this. It helps me organize my thinking (and learn more about Lustre internals
).
The issue I have with "criterion decided by admins" and soft blocking is that the client has no idea what those criteria are, nor do I.
I would think that the clients will always want to try and connect anyway, so it isn't clear how this would be used in practice.
If this is only related to client/server versions, there is already a mechanism to handle that which could be made a tunable parameter instead of a compile-time constant.
The four policies I have envisioned so far are ALLOW, WARN, SOFT_BLOCK, and HARD_BLOCK. Two of them, specifically ALLOW and HARD_BLOCK, I think have a clear technical purpose - selectively blocking only some clients.
WARN and SOFT_BLOCK have more of a social purpose. WARN would cause the client to emit some kind of message like "the admin might block this client in the future / doesn't recommend using this client". Of course, warning messages are pretty easy to ignore. SOFT_BLOCK essentially forces the user mounting the file system to acknowledge "the admin really means it, this client might be blocked / don't use it". You can silence that by passing the mount flag. But at that point, it would be clear to the user that they are doing something the admin would prefer they didn't. It can avoid (in theory) forcing the admin to talk to everyone individually.
An analogy would be hotel WiFi: before your phone/laptop can connect, you have to go into the hotel's web portal and acknowledge you are going to use the WiFi properly. You can still connect - but the hotel can make you read stuff.
It might not be clear to the user why the client is being blocked. There might be a way for the server to pass back a 'verdict' and message. I haven't looked into that yet, but that would be an improvement. Then the client could say "Permission denied - Reason VERSION - Message 'Don't use 2.6 clients'". But I don't think that's strictly necessary for the first implementation.
I was just looking into this in the context of LU-17435, and being able to dynamically add client NIDs to the "deny list" if they have issues maintaining a stable connection to the server. This would be managed by LNet/ptlrpc/ldlm in the kernel, so having a specific "reason code" would make sense, like "14:client has unstable network interface" or whatever, and a simpler check like "1:client Lustre version is too old" would probably be the first one. I'm not sure there could be any "free form" message unless a string was put in the server reply (in some place we don't care about because the connection was denied after all).
This makes me wonder if the SOFT_BLOCK mechanism should be on a "per reason code" basis, so that if the client admin says "sure I understand my client version is too old" and continues anyway, that doesn't give them free reign to connect if a new reason is added. I think that would mitigate my concerns with the usefulness of SOFT_BLOCK. For the LU-17435 case I think the server could use the same reason code but decide on the severity of the issue (e.g. 2-4 reconnects/resends per hour = WARN, 5-9 reconnects/resends per hour = SOFT_BLOCK, >= 10/hour = HARD_BLOCK), but that should be decided in the context of that ticket.