Details
-
Story
-
Resolution: Fixed
-
Minor
-
None
-
None
-
11343
Description
This is for the Lustre Manual updates for IU Shared Secret Key authentication and encryption feature.
Attachments
- Lustre_Sec_Howto.pdf
- 234 kB
- Nathan Lavender
Issue Links
Activity
Joseph Gmitter (joseph.gmitter@intel.com) merged in patch https://review.whamcloud.com/23673/
Subject: LUDOC-197 sk: Shared-Secret Key Security
Project: doc/manual
Branch: master
Current Patch Set:
Commit: ea26a5ef83dd090d431f5c39b54bc928e9b0d822
Nathan Lavender (nblavend@iu.edu) uploaded a new patch: http://review.whamcloud.com/23673
Subject: LUDOC-197 sk: Shared-Secret Key Security
Project: doc/manual
Branch: master
Current Patch Set: 1
Commit: 433a40dc6bbfa196739401b70620f9de00cb6d6a
In order to build Lustre with the Shared Key patches you must install the following packages on your build server: libgssglue libgssglue-devel krb5-libs krb5-devel openssl-devel. These dependencies are in addition to those listed in the build document here:
https://wiki.hpdd.intel.com/pages/viewpage.action?pageId=8126821
Then pass option '--enable-gss' when configuring the patched Lustre source:
- cd ~/lustre-release/
- ./configure --with-linux=/path/to/kernel/source/ --enable-gss [other options]
lctl conf_param fsname.srpc.flavor.default=flavor
should be set_param -p I think, although I long ago gave up tracking this...
● <lnd:lnd>num
this NID section looks wrong - it's not really a NID anyhow, it's a "network" only.
The /etc/requestkey.d/lgssc.conf for Lustre should look like the following
Can we just add this file to the Lustre RPMs?
Future versions of Lustre will support an init script to run this as a service
Is there a bug filed?
- In the "generating keys" section you introduce nodemap term without defining or explaining it.
- There are two algorithms numbered "0". That can't be right. And you don't show how to choose them.
- Please add examples of the mount command with --skpath, and maybe add paths to the lgss_sk examples:
lgss_sk -w /etc/lustre_keys/tank.biology.key
- Please add a section on building Lustre with shared-key support. Prerequisites, build flags, etc.
The expiration here is actually for the sptlrpc context tied to a session. The intention is to force connections to generate new Diffie-Hellman sessions keys which offer perfect forward secrecy (PFS) minimizing the amount of data used with a particular key.
In that case, it would be useful to make this intent clear in the documentation that this timeout is for the session keys. It also would be better to use an default session key timeout that actually makes this functionality useful out of the box (e.g. 1 day or 1 week, rather than 60+ years). Is there any harm from expiring the session keys regularly?
the documents/manual section should start with at least a tiny summary of what this is about before diving directly into the details.
We'll have to work on something, a lot of this was notes about the Lustre security things I put together while working on the SK. Much if it I don't remember the specifics but I wanted to at least capture all the hidden/undocumented things the code supported. I'll need to review a lot of this to check for accuracy and testing to see if it works but it is beyond the SK scope. So we could trim some of it out into a separate document for now until it can be reviewed. For now I've noted those places as code review below.
"To default all interfaces to a specific settings use the “default” word." is a bit unclear. I'm assuming that "default" can be used to specify the default security flavor, and then "NID" and "NID.sp2sp" can be used to override the broader security flavor? I would hope that one can specify the security type for a whole nodemap at once, without having to specify it for each NID explicitly? That would be an option for "NID" I think.
Updated the default wording. I need to dig deeper into the NID support in the code but there is no nodemap tie in today. This was sptlrpc in general but would be good to have as a future enhancement.
it should be made clear that the conf_param commands are run on the MGS (e.g. by specifying the "mgs#" prompt before the command or in the description)
Updated
what happens if different security flavor for one peer connection is specified differently than flavor in the reverse direction? For example, if the cli2mds flavor is different than the mds2cli flavor? Is that an error, or is this possible (not sure it makes sense)?
code review
the "mdt" and "ost" keywords should really be "mds" and "oss" since the connection is between nodes and not specific targets I think?
The security contexts are actually target/import/obd based. I find this to be less then ideal but it is how it works today, in the future we could probably do some work to change this.
the separation of "mgc" from "cli" implies that MGC-MGS connections are special, but this isn't explicitly stated. If the client needs to encrypt the connection to the MGS, how does it know what flavor to use if it hasn't connected to the MGS to get the config yet? IIRC, there is a separate sptlrpc config log, does the client fetch that first just to get the sptlrpc flavor and then fetch the rest of the config using the secure connection?
MGC is in fact a separate key. I just hit this last week when testing and found it was not working due to significant differences in the obd_name for MGC targets. There is an "-o mgssec=<flavor>" option to mount.lustre. I'm still open to changing how the shared key handles these keys but the mgssec option has been in Lustre for years even though its not documented in the man page or help options. There is a separate sptlrpc file in the CONFIGS directory and it doesn't work correctly with "set_param -P" which I filed a bug for.
the "skpi" flavor would be better described as "shared key privacy and integrity" to match the acronym
Updated
no example or description is given for "<lnd:lnd>num". Is that a formatting error, or what is that?
code review
is it possible to specify the flavor for a specific NID (ie. one node) or only at a per-network level? If not, then "NID" should be replaced in the text with "net", since NID usually means a single node's network identifier
code review
the example uses flavor "krb" for tcp1, but that isn't a valid flavor according to your list?
updated
a brief explanation with the examples is always useful, like "set the default security flavor to null (no encryption), but the clients on the tcp1 (WAN) network use krb5p integrity and protection, while the client-to-OSS connections in the o2ib2 network are using the shared-key privacy and integrity checking
Nathan I discussed creating some recommendations with examples. We will update as we get through more testing.
is the /etc/request-key.d/lgssc.conf file included and installed with the Lustre RPMs? It seems that would be OK, since it would never be used unless there is a security flavor for a connection. The fewer things that users need to configure, the better.
It's not but I like the idea to include it. Not sure how much effort that is since I don't mess with the Lustre build scripts.
what drives the requirement for hostname reverse lookup working? Is it still possible to configure using purely IP-address NIDs without DNS as it is today?
I believe this was one of the things Sebastien fixed with resolving which interface the requests come from. http://review.whamcloud.com/#/c/14042/. Not 100% sure but it is kerberos related not SK.
s/linux/Linux/
updated
the asymmetry of commands between the client and server adds complexity to the configuration, IMHO.
I would agree its not ideal but it was the most reasonable way I could find to meet all the requirements I had with the missing information available at different locations in the stack and auth process.
is it possible to use the kernel keyring to handle server keys instead of the separate lsvcgssd using a different mechanism? I believe that was implemented before the kernel keyring was available. This isn't necessarily something that needs to be fixed for this project, but IMHO this would simplify usability.
Possibly but the whole sunrpc caching piece would have to thrown out so it would be significant amount of work.
what is the benefit of requiring individual components in lsvcgssd? Is that a performance win or for security? Does this break existing systems that don't have these options on the command line? I don't think it is relevant to the manual that this is done as part of the shared key feature.
Thinking from my use case I'll never use kerberos and there is no reason to allow gssnull so to minimize the security perimeter you can disable those which you don't need. Just one more layer in the onion. I've removed the note about the addition as part of the shared key feature, we've attempted to involve those using kerberos as part of the process for code reviews so they should be aware of the change.
s/lustre/Lustre/
updated
for /dev/random blocking, it would be useful to mention that pressing keys on the keyboard will speed up entropy generation
updated
is it useful to add obsolete HMAC algorithms like MD5 and SHA1 at this time? Since this is a new feature there is no compatibly requirement like Kerberos has, it would be best to only offer good security options.
That's probably reasonable, I'll try to update things.
having the key expiry be a relative time (i.e. 100000s from now) is IMHO less useful than if it were an absolute time (i.e. Nov 21 2015). Otherwise, the key expiry will depend on when the client and server both mounted the filesystem and loaded keys, and the expiry would be reset each time the server was rebooted, which I think would be confusing.
There is no real key expiration in the sense of key usability. It is the administrators responsibility to remove keys from the keyring that are no longer "valid". They will also have to purge their security contexts which is not detailed yet in the document. The expiration here is actually for the sptlrpc context tied to a session. The intention is to force connections to generate new Diffie-Hellman sessions keys which offer perfect forward secrecy (PFS) minimizing the amount of data used with a particular key.
it doesn't describe how to update a key that is about to expire. My understanding is that it should be possible to load multiple keys on the client and server for the same keyring, so that the client can use the new key when the key for the current session is about to expire.
see above
IMHO loading new keys should override existing keys, otherwise if the key is updated and there are existing keys the admin may not notice that the keys were not updated, and requiring the keys to be unloaded could cause a service interruption.
What I envisioned for a use case:
1. You update your lustre startup script to include the --skpath parameter to point to your key directory so it loads all your keys for each target you mount
2. When you mount all of them at startup your repeatedly adding the same keys over and over. While I don't think there is going to be any measurable performance impact it seemed a little ugly for me.
I'm open to a discussion though to determine how it should work. Maybe we can discuss during the concall.
the skpath should load all keys provided and return an error at the end
I'll fix this.
Nathan, thanks for the documentation. It would be most convenient if this was uploaded to Gerrit and/or into the lustre.org wiki in a form that could be reviewed and updated directly. Some of my comments may be related to issues in the implementation and not necessarily gaps in the documentation, in which case the comments/questions are directed at Jeremy.
Some comments here to start:
- the documents/manual section should start with at least a tiny summary of what this is about before diving directly into the details.
- "To default all interfaces to a specific settings use the “default” word." is a bit unclear. I'm assuming that "default" can be used to specify the default security flavor, and then "NID" and "NID.sp2sp" can be used to override the broader security flavor? I would hope that one can specify the security type for a whole nodemap at once, without having to specify it for each NID explicitly? That would be an option for "NID" I think.
- it should be made clear that the conf_param commands are run on the MGS (e.g. by specifying the "mgs#" prompt before the command or in the description)
- what happens if different security flavor for one peer connection is specified differently than flavor in the reverse direction? For example, if the cli2mds flavor is different than the mds2cli flavor? Is that an error, or is this possible (not sure it makes sense)?
- the "mdt" and "ost" keywords should really be "mds" and "oss" since the connection is between nodes and not specific targets I think?
- the separation of "mgc" from "cli" implies that MGC-MGS connections are special, but this isn't explicitly stated. If the client needs to encrypt the connection to the MGS, how does it know what flavor to use if it hasn't connected to the MGS to get the config yet? IIRC, there is a separate sptlrpc config log, does the client fetch that first just to get the sptlrpc flavor and then fetch the rest of the config using the secure connection?
the "skpi" flavor would be better described as "shared key privacy and integrity" to match the acronym - no example or description is given for "<lnd:lnd>num". Is that a formatting error, or what is that?
- is it possible to specify the flavor for a specific NID (ie. one node) or only at a per-network level? If not, then "NID" should be replaced in the text with "net", since NID usually means a single node's network identifier
- the example uses flavor "krb" for tcp1, but that isn't a valid flavor according to your list?
- a brief explanation with the examples is always useful, like "set the default security flavor to null (no encryption), but the clients on the tcp1 (WAN) network use krb5p integrity and protection, while the client-to-OSS connections in the o2ib2 network are using the shared-key privacy and integrity checking
- is the /etc/request-key.d/lgssc.conf file included and installed with the Lustre RPMs? It seems that would be OK, since it would never be used unless there is a security flavor for a connection. The fewer things that users need to configure, the better.
- what drives the requirement for hostname reverse lookup working? Is it still possible to configure using purely IP-address NIDs without DNS as it is today?
- s/linux/Linux/
- the asymmetry of commands between the client and server adds complexity to the configuration, IMHO.
- is it possible to use the kernel keyring to handle server keys instead of the separate lsvcgssd using a different mechanism? I believe that was implemented before the kernel keyring was available. This isn't necessarily something that needs to be fixed for this project, but IMHO this would simplify usability.
- what is the benefit of requiring individual components in lsvcgssd? Is that a performance win or for security? Does this break existing systems that don't have these options on the command line? I don't think it is relevant to the manual that this is done as part of the shared key feature.
- s/lustre/Lustre/
- for /dev/random blocking, it would be useful to mention that pressing keys on the keyboard will speed up entropy generation
- is it useful to add obsolete HMAC algorithms like MD5 and SHA1 at this time? Since this is a new feature there is no compatibly requirement like Kerberos has, it would be best to only offer good security options.
- having the key expiry be a relative time (i.e. 100000s from now) is IMHO less useful than if it were an absolute time (i.e. Nov 21 2015). Otherwise, the key expiry will depend on when the client and server both mounted the filesystem and loaded keys, and the expiry would be reset each time the server was rebooted, which I think would be confusing.
- it doesn't describe how to update a key that is about to expire. My understanding is that it should be possible to load multiple keys on the client and server for the same keyring, so that the client can use the new key when the key for the current session is about to expire.
- IMHO loading new keys should override existing keys, otherwise if the key is updated and there are existing keys the admin may not notice that the keys were not updated, and requiring the keys to be unloaded could cause a service interruption.
- the skpath should load all keys provided and return an error at the end.
Patch has landed