[LU-17431] dynamically configurable nodemap Created: 16/Jan/24 Updated: 05/Feb/24 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.14.0, Lustre 2.16.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor |
| Reporter: | Andreas Dilger | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
A number of sites using containers with Lustre have requested the ability to dynamically configure nodemaps on a regular basis to handle environments where they are being created and removed on a e.g. a per-job basis to contain only the nodes in the job for a specific runtime environment. Storing all of the nodemaps in the config llog with frequent updates would quickly consume all of the space in the config log, and be slow to process during mounting. Instead, there should be some option to allow using "lctl set_param nodemap.NODEMAP.PARAM" or an option to "lctl nodemap" to directly create and manage nodemaps without using the config llog to distribute the parameters to each server. This would require some external orchestration to set the nodemap parameters consistently across servers (e.g. "clush -a lctl set_param ..."). It might be desirable to have these dynamic nodemaps inherit settings from an existing persistent nodemap that would otherwise catch their NID range. This would have a number of benefits:
For identifying at mount time which nodemap is used, this could be hierarchical. Find the top-level NID range as today and map it to a nodemap, and consider this the "parent" nodemap. Then check if there is a second-level NID range below the parent and re-scan for matching NIDs and add extra restrictions from the child nodemap. Possibly repeat (max 10 levels for safety?). That avoids making the initial NID searching complex (broad NID ranges can be used for the whole cluster) and then become more finegrained only if needed. |
| Comments |
| Comment by Andreas Dilger [ 16/Jan/24 ] |
|
sebastien, bolausson, |
| Comment by Bjoern Olausson [ 17/Jan/24 ] |
|
Hello Andreas, yes, that makes sense.
|
| Comment by Sebastien Buisson [ 19/Jan/24 ] |
|
Hi Andreas, If I understand correctly, that would be an in-memory only, non persistent nodemap feature that overrides the 'legacy' nodemap feature, in a sense that it would only modify a subset of all nodemap properties available. Because of the 'default' legacy nodemap, there is always a matching nodemap for clients. So that would not be a problem to make these dynamic nodemaps inherit settings from an existing persistent nodemap. I am not sure how we could name this feature, maybe 'clientmap' so that it clearly differs from the persistent nodemap (and it targets client nodes, if I understand correctly the customer requirements)? From an implementation point of view, I think this makes sense. It would reuse some of the already defined data structures for nodemap, but not modify the 'legacy' nodemap feature itself.
I understand this matching procedure, except for the 'Possibly repeat' part. Today we do not support overlapping NID ranges, so I guess after the legacy NID search, there would be only one pass to find a match in the 'clientmap' ranges? From an operational perspective, the difficulty I can see is the job scheduler needs to be able to run 'lctl' commands on Lustre server nodes. But maybe this is something usual? |
| Comment by Andreas Dilger [ 05/Feb/24 ] |
By "repeat" I mean "recurse", if there are multiple levels of sub-maps. For example, a persistent nodemap for 192.168.[0-255].[0-255], with a sub-nodemap covering 192.168.2.[0-255], and a sub-sub-nodemap covering 192.168.2.[16-31]. The matching would be fast at connect time, since the sub-nodemaps would form a tree and reduce the search space at each level.
Correct, there would only be one pass at each level, and the list of ranges would be disjoint below each nodemap.
Sorry, I don't understand the question. Just like regular nodemaps this would only be evaluated at connection time to see which top-level and (optionally) sub-nodemaps to get additional settings.
To my thinking "dynamic nodemaps" or "inherited nodemaps" is fine, since the majority of the functionality is identical to regular nodemaps except they aren't persistent, and they are hierarchical from the parent nodemaps. |
| Comment by Sebastien Buisson [ 05/Feb/24 ] |
OK, I had not thought about hierarchical nodemaps. I guess it can be implemented like that, but it will introduce a little bit more complexity in the code, and in the way to access those sub-nodemaps via lctl.
What I mean is that we are talking about special nodemaps that are created 'on-the-fly'. So the job scheduler, or something closely related to the jobs being run on the cluster, would have to be able to create those dynamic nodemaps. And because this is a server side thing, the job scheduler would have to launch lctl commands on the Lustre servers. |
| Comment by Andreas Dilger [ 05/Feb/24 ] |
|
Since this would only be limiting a nodemap further, then one option that might be acceptable (security wise) is to allow a client with admin/root privs to be able to execute this and have it apply to the servers? Of course, that would add complexity and being able to run a command directly on the servers would be easier for us (if more complex for the user). We'd have to ask one of the sites about this, maybe the author of the LUG (LAD?) presentation that was discussing their complex nodemap config a couple of years ago? |