[LU-2456] Dynamic LNet Config Main Development Work Created: 28/Aug/12 Updated: 29/May/17 Resolved: 10/Mar/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | New Feature | Priority: | Major |
| Reporter: | Doug Oucharek (Inactive) | Assignee: | Amir Shehata (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sub-Tasks: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 5798 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
This ticket has been created to track the main development work for the Dynamic LNet Config project. |
| Comments |
| Comment by Doug Oucharek (Inactive) [ 28/Aug/12 ] |
|
Project development timeline |
| Comment by Doug Oucharek (Inactive) [ 28/Aug/12 ] |
|
I've attached a PDF of a timeline I have projected for the development work for this project. It was done with the following criteria: 1- One developer is working on the project (me). |
| Comment by Doug Oucharek (Inactive) [ 12/Dec/12 ] |
|
The first of 5 patches is in Gerrit: http://review.whamcloud.com/#change,4793 |
| Comment by Doug Oucharek (Inactive) [ 19/Dec/12 ] |
|
The second of 5 patches in Gerrit: http://review.whamcloud.com/#change,4872 |
| Comment by Doug Oucharek (Inactive) [ 19/Dec/12 ] |
|
3rd patch (route_buffer support) in Gerrit: http://review.whamcloud.com/#change,4874 |
| Comment by Doug Oucharek (Inactive) [ 10/Jan/13 ] |
|
4th patch (net support) in Gerrit: http://review.whamcloud.com/#change,4994 |
| Comment by Amir Shehata (Inactive) [ 09/Oct/13 ] |
|
Previous patches are no longer applicable. I'm in the process of adding new patches. |
| Comment by Christopher Morrone [ 19/Nov/13 ] |
|
In patch 8024, Doug said:
Doug, yes, the conversation and examples were mostly about the rpm packaging path, but they should not be intrepretted as requiring rpm. When building rpms, the methods I described should be used. When rpm packaging is not the end goal, folks should be able to build lustre at the command line as they choose. I fully support that. There are is a standard autotools mechanisms for building against external packages. See this section of the autoconf manual: http://www.gnu.org/software/autoconf/manual/autoconf.html#External-Software You guys should add "AC_ARGS_WITH([libyaml]" handling to Lustre's build system. Then users can either:
This stuff is absolutely simple and standard in the open source software world. Please embrace it. |
| Comment by Amir Shehata (Inactive) [ 20/Nov/13 ] |
|
November 19th, 2013
Documentation patch: |
| Comment by Christopher Morrone [ 20/Nov/13 ] |
|
I think it would be helpful if someone could explain the general plan for all of the new libraries being introduced to the lustre install by this work. For instance, why do we want to see: libcYAML liblustreconfigapi libauto_complete liblustre_cfg_cmds Without looking into the code too closely, my naive thought is that what we really want is: liblnet Just one clearly named library. Do we really need four new libraries, at least two of which aren't even very clearly lustre related (from their names)? None of the names identify them as related to lnet configuration. I would also suggest that any new libraries that we introduce need to be properly versioned. That has been sorely lacking in lustre. Using libtool would simplify the make files and provide the options to correctly version the libraries. |
| Comment by Amir Shehata (Inactive) [ 20/Nov/13 ] |
|
liblustreconfigapi:
libcYAML libauto_complete liblustre_cfg_cmds Please note that the design intention is not to restrict DLC to only LNET parameters, but to allow the expansion to configuring other Lustre parameters in the future (if the use case arises). That's why I don't use the term LNET explicitly when I'm naming libraries, since under this design, the same libraries can be expanded to include APIs to configure Lustre parameter. Also all this is explained in greater detail in the user documentation: As for appropriately versioning the libraries, I was simply following the way the rest of the lustre libraries are being built. I will need to look at how libtool can be used. |
| Comment by Christopher Morrone [ 20/Nov/13 ] |
|
From that description, it sounds like libauto_complete, libcYAML, and possibly liblustre_cfg_cmds libraries are internal implementation details of the lustre tools, and not appropriate for installation system-wide for users. Perhaps you want what the GNU tools call "convenience" libraries: http://www.gnu.org/software/automake/manual/html_node/Libtool-Convenience-Libraries.html
I will try to provide more review details in the documentation patch, but I may not get to it before some time next week. But some high level comments here:
That is understandable. Unfortunately, Lustre has a great deal of technical debt that we do not wish to perpetuate in new code. |
| Comment by Christopher Morrone [ 20/Nov/13 ] |
|
Additionally, I think that the "liblustreconfigapi" name should be changed. This is specifically about configuring lnet, not lustre. Giving the library that name will confuse users and system administrators. |
| Comment by Amir Shehata (Inactive) [ 20/Nov/13 ] |
|
Thanks for the documentation feedback, I'll work on another revision of the patch. With regards to the libraries, I don't see them as convenience libraries. These are libraries which are required for both the new configuration tool and the LNET Configuration C-API, and they will need to be installed along with the standard Lustre installation. The libcYAML has to be separate... since it's potentially used by multiple binaries. Integrating it in the C-API or the configuration tool would mean code replication The auto_complete library is infrastructure library. The intent is for it to be used by any future configuration utilities. As it currently stands the new configuration tool is the only one utilizing it, so I can possibly make it as part of the binary (which in earlier iterations of the code it was), however, it is good design to create it as a separate infrastructure entity, so the potential for its reuse is obvious.. |
| Comment by Amir Shehata (Inactive) [ 20/Nov/13 ] |
|
As I mentioned the intent is for the possibility of extending the liblustreconfigapi to contain also lustre configuration. Making the name change will force us to create a different library, if and when this happens. I'll discuss internally and see which approach we want to take going forward. |
| Comment by Christopher Morrone [ 20/Nov/13 ] |
With regards to the libraries, I don't see them as convenience libraries. These are libraries which are required for both the new configuration tool and the LNET Configuration C-API, and they will need to be installed along with the standard Lustre installation. You may be misinterpreting the term "convenience library". It doesn't mean that the code is unimportant. It simply means that the library only exists within the confines of the build tree, and is a "convenience" to the software developers in that they can refer to a group of object files by a single name to use the code. The configuration tool and the "LNET Configuration C-API" are both things that are developed internal to the Lustre tree. Unless we have a very clear use case for applications external to lustre that need to use these internal libraries, they definitely do not belong in /usr/lib. As I mentioned the intent is for the possibility of extending the liblustreconfigapi to contain also lustre configuration. I don't remember that being a stated goal of this project. Did I miss that? Does the design document explain how that would be done? I thought that the goal of this project was to make configuring lnet more like you configure other networks on Linux systems. To make the networking configuration cleaner, more like sysadmins are used to, and dynamic. I'm rather uneasy about that goal of configuring lnet and lustre from one unified interface. We have long kept a clean separation between lnet and the rest of lustre, and we need to be very careful about throwing that away. I for one would prefer that we keep lnet and lustre configuration separate. At LLNL, we even have two separate and distinct init files, one for starting lnet, and another for starting lustre servers. |
| Comment by Amir Shehata (Inactive) [ 20/Nov/13 ] |
|
The new configuration tool, like lctl will be in /usr/sbin. The tool needs to link against these libraries. Also let's say a 3rd party tool is developed to configure LNET, such as IML, it will need to link against the LNET Configuration Library which will pull in the libcYAML, as an example. From my reading on Convenience Library: http://www.sourceware.org/autobook/autobook/autobook_74.html#SEC74 The other concern I understand is you're saying that since they do not have a use outside of lustre, then they shouldn't be in /usr/lib. Where would they be? For example both liblustre and liblustreapi are in /usr/lib64, and both of these libraries are intended for use in lustre only. Can you please clarify this point. |
| Comment by Robert Read (Inactive) [ 20/Nov/13 ] |
|
These look like interesting libraries and they might potential for reuse, but I think most users would be quite surprised to see the lustre package install seemingly extraneous utility libraries into /usr/lib. If these libraries are indeed more generally useful, then the proper thing to do is move them to their own project and publish them as a standalone package. For now, though, I suggest just linking these libraries into the one tool that is using them (as already suggested), and we can consider whether or not we publish them as standalone libraries independently of this discussion. Next time, just write the CLI in Python. |
| Comment by Amir Shehata (Inactive) [ 20/Nov/13 ] |
|
after discussing it with Robert, I more fully understand the reasons for not installing these libraries in /usr/lib. I'll change it to eliminate the installation of these libraries. However, as liblustreconfigapi is intended to be used externally, same as liblustre and liblustreapi, it will still be installed as it is now. However, I see your point about separation between lustre and lnet and will make the appropriate changes. |
| Comment by Doug Oucharek (Inactive) [ 20/Nov/13 ] |
|
The consideration I want to add is the need for the new API libraries to be LGPL rather than GPL. Third party applications which are not GPL'ed cannot use these libraries if they have to be GPL. If the way these libraries are built into our existing binaries dictate they be GPL, then that won't work. |
| Comment by Robert Read (Inactive) [ 20/Nov/13 ] |
|
Just a nit, but would you mind dropping the "api" from then name of that library? It's a bit like saying "PIN number" |
| Comment by Robert Read (Inactive) [ 21/Nov/13 ] |
|
+1 for LGPL. There is already a precedent for this as the new HSM library functions were helpfully added by CEA as LGPL. (Though they are distributed in liblustreapi, with is of course mostly GPL. d'oh!) |
| Comment by Christopher Morrone [ 21/Nov/13 ] |
Awesome! I'm glad we are on the same page. I do understand that liblustreconfigapi should be public, which is why I only listed the other three as a problem. My only issue with liblustreconfigapi was the name. Robert makes a good point about dropping "api" from the name as well. We did that for "liblustreapi", but it did feel dirty when I did it. The rationale was that "liblustre" name was already taken in the lustre world. So you can add liblustreapi to your list of lustre counter-examples. |
| Comment by Amir Shehata (Inactive) [ 22/Nov/13 ] |
|
After getting feedback from Andreas, Robert and Chris, I believe the best way to move forward, is to make the LNET configuration utility use the existing lctl infrastructure. This will naturally eliminate all libraries except for the LNET C Configuration API (the LGPL library). The utility will be a thin layer, which parses commands line input and directly call into the LNET C Configuration API, without doing any error checking, since all error checking is localized in the library. Additionally, I'll move the LNET C Configuration API Library and the LNET configuration utility into the lnet/utils directory, to maintain a strict separation between LNET and Lustre. Let me know if this plans sounds good to everyone. |
| Comment by Robert Read (Inactive) [ 22/Nov/13 ] |
|
That's disappointing since lctl is hardly a shining example of CLI design, and it looked like you had some good ideas that would could have used to upgrade the rest of of our tools. However, it's your call. There's always python... BTW, I think having error checking in the library is fine just as long as the library is only returning error codes to the caller and not printing any error messages itself. |
| Comment by Christopher Morrone [ 25/Nov/13 ] |
Actually, I'm not sure that I am in favor of that at all. Right now lnet is pretty much a stand-alone component. lustre has dependencies on lnet, but lnet does not depend on lustre. If you make lnet's configuration reliant on lctl, you introduce a new dependency from lnet on lustre. I think we would like to maintain the clear separation of code and tools between lnet and lustre. |
| Comment by Robert Read (Inactive) [ 25/Nov/13 ] |
|
The core command line Parser code is actually in libcfs, and is already being used by tools in lnet/utils so this is at least not adding any new dependencies. |
| Comment by Christopher Morrone [ 02/Dec/13 ] |
|
OK, granted. Still, I am not not in favor of adding this to lctl. I would like to see it as a separate command under the lnet subdirectory somewhere (lnet/utils, for instance). |
| Comment by Robert Read (Inactive) [ 02/Dec/13 ] |
|
Agreed, and I believe Amir does intend to create a new config utility in lnet. The difference is he now plans to use the same cli parser library that lctl uses, instead of using the new one he originally wrote for DLC. |
| Comment by Amir Shehata (Inactive) [ 03/Dec/13 ] |
|
That is correct. I'm almost done my updates to the DLC patches. |
| Comment by Christopher Morrone [ 13/Dec/13 ] |
|
I would like to argue again for seeing an updated DLC design document attached to this ticket (or in confluence or something). One thing I'm realizing is that I'm not particularly in favor of the command line design. The design currently looks something like this: lnetcfg {add|del|show} OBJECT [options]
I think that is fundamentally backwards. Instead, we should take inspiration from the linux "ip" command, and other commands like git. The basic model should be: So for instance the first line of the ip command's man page synopsis shows this: ip [ OPTIONS ] OBJECT { COMMAND | help }
That is the style that I think we want to emulate for the new command. So some examples might be: lnetctl network add o2ib0 dev ib0 lnetctl route add o2ib9 routers 192.168.121.[160-163]@o2ib2 If there are global variables that we want to set, it is reasonable to use global get/set commands. For instance: lnetctl set small_buffers 1000 lnetctl set medium_buffers 1000 lnetctl set large_buffers 1000 lnetctl set routing 1 lnetctl set routing 0 For handling yaml files, I think the original plan is kind of ugly: lnetctl add file --name <yaml file name> lnetctl del file --name <yaml file name> I can't even really wrap my head around how the "del file" behavior would even work. For instance if I have this section in my yaml file: buffer:
enable: 0
and then I feed it "del file", do I wind up with routing enabled? If I have this: buffer:
tiny: 2000
What does "del file" do??? If I can't figure this out, there is no way that a sysadmin will. And "buffer" is the wrong name for that yaml section as well. But I digress. A better command line for this might look like: lnetctl import < myfinefile.yaml lnetctl import myfinefile.yaml lnetctl export > backupsettings.yaml lnetctl export backupsettings.yaml |
| Comment by Amir Shehata (Inactive) [ 14/Dec/13 ] |
|
The CLI lnetctl <cmd> <entity> [parameters] except that we use optional parameters to pass in the different parameters. The decision is to get away from positional parameters. ex: add route --net tcp0 --gateway 10.1.1.2@tcp1 --hop 1 --priority 0 Also please note that the examples you gave above are not consistent form. In some cases you have lnetctl <command> <object> and in other cases you have lnetctl OBJECT COMMAND In my opinion, I think a consistent form makes it easier to get more familiar with the command line. I agree that "del file" with regards to enabling/disabling routing might be confusing. Here is my suggestion. To reset all router buffers to default values the yaml file that you feed to the del command would look like:
If you add other parameters under buffer they will be ignored. For example:
it'll still reset the buffer to the default values. To disable routing the yaml file would include:
This way you can take the exact same YAML file feed it to any of the commands and they will get processed correctly. Parameters that are meaningless to the command are ignored. With your suggestion: What operations do we do on myfinefile.yaml? What is contained in the yaml file? That's why you explicitly tell the utility what to do on the yaml file (add, del or show) |
| Comment by Amir Shehata (Inactive) [ 14/Dec/13 ] |
|
User Documentation as exported from the internal Wiki |
| Comment by Amir Shehata (Inactive) [ 14/Dec/13 ] |
|
I have a recommendation. I'm finding that progressing on the project with this form of communication is inefficient. Would you agree for a skype/phone meeting, where we can hash out the different points of view. I would say that we can call a meeting with yourself, Doug Oucharek, Andreas and myself. Does that sound like a reasonable plan? |
| Comment by Christopher Morrone [ 16/Dec/13 ] |
|
I'm up for a call. I am free after 3:30 PST today, or most of the day tomorrow. |
| Comment by Liang Zhen (Inactive) [ 26/Mar/14 ] |
|
comments for DLC patch http://review.whamcloud.com/#/c/9785/ |
| Comment by Liang Zhen (Inactive) [ 12/Apr/14 ] |
|
suggestions about dlc APIs based on discussion in Miami |
| Comment by James A Simmons [ 13/May/14 ] |
|
http://review.whamcloud.com/#/c/9830 http://review.whamcloud.com/#/c/8021 |
| Comment by James A Simmons [ 04/Sep/14 ] |
|
I have been testing this the utilities and I saw this: lnetctl > help set lnetctl > help set large_buffers Is this correct behavior? |
| Comment by Amir Shehata (Inactive) [ 04/Sep/14 ] |
|
lnetctl utility has the same work flow as the ip utility. The only difference is that the infrastructure I was asked to use for the utility allows interactive mode, and the patter "help <cmd>" in interactive mode. However, that infrastructure doesn't support the pattern "help <cmd> <subcmd>" readily. The way it works: lnetctl > help set
set: set {tiny_buffers | small_buffers | large_buffers | routing}
lnetctl > set tiny_buffers help
set tiny_buffers: set tiny routing buffers
VALUE must be greater than 0
so help <cmd> gives the first level of help. To get help on subsequent commands, you put the "help" keyword after the subcommand. If you run directly from the command line, the output would look like this: [root@rt1 lutfTMP]# lnetctl help
Available commands are:
lnet
route
net
routing
set
import
export
stats
peer_credits
help
exit
quit
For more help type: help command-name
[root@rt1 lutfTMP]# lnetctl route help
route add: add a route
--net: net name (ex tcp0)
--gateway: gateway nid (ex 10.1.1.2@tcp)
--hop: number to final destination (1 < hops < 255)
--priority: priority of route (0 - highest prio
route del: delete a route
--net: net name (ex tcp0)
--gateway: gateway nid (ex 10.1.1.2@tcp)
route show: show routes
--net: net name (ex tcp0) to filter on
--gateway: gateway nid (ex 10.1.1.2@tcp) to filter on
--hop: number to final destination (1 < hops < 255) to filter on
--priority: priority of route (0 - highest prio to filter on
--detail: display detailed output per route
route help: display this help
[root@rt1 lutfTMP]# lnetctl route show help
route show: show routes
--net: net name (ex tcp0) to filter on
--gateway: gateway nid (ex 10.1.1.2@tcp) to filter on
--hop: number to final destination (1 < hops < 255) to filter on
--priority: priority of route (0 - highest prio to filter on
--detail: display detailed output per route
|
| Comment by Andreas Dilger [ 29/Sep/14 ] |
|
One question that came up at LAD last week was why the lnetctl functionality was not (also) available via lctl? Like the other LNet-specific commands that are already available via lctl, these commands could also be added just by adding them to the lctl parser. It is still desirable to have a standalone lnetctl command for the systems that are using only LNet (there are a few). The stated benefit of having these new commands in lctl is that this reduces the number of binaries that need to be installed on the compute nodes, which try to have a minimal memory/space footprint. That isn't a requirement for landing, but rather an enhancement request from the end users. |
| Comment by James A Simmons [ 03/Oct/14 ] |
|
Ticket LU-5705 was created to discuss lctl versus lnetctl |
| Comment by Doug Oucharek (Inactive) [ 07/Oct/14 ] |
|
Ticket |
| Comment by Sarah Liu [ 28/Oct/14 ] |
|
Test plan attached is dated, please update the document. |
| Comment by Amir Shehata (Inactive) [ 07/Nov/14 ] |
|
Landed in 2.7 |