[LU-5969] Create an LGPL version of liblustreapi Created: 02/Dec/14 Updated: 12/Apr/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major |
| Reporter: | Frank Zago (Inactive) | Assignee: | Hongchao Zhang |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 16672 | ||||||||||||||||||||||||||||||||||||
| Description |
|
Cray would like to have an LGPL version of the liblustreapi A starting point would be to create that new library with the existing We should take this opportunity to do some cleanup in the library, and
As a test of the viability for this plan, we'd like to first port the get_param
get_root_path
libcfs_ukuc_get_rfd
libcfs_ukuc_msg_get
libcfs_ukuc_start
libcfs_ukuc_stop
llapi_chomp_string
llapi_create_volatile_idx
llapi_error
llapi_error_callback_set
llapi_fd2fid
llapi_fid2path
llapi_file_open_pool
llapi_get_agent_uuid
llapi_get_mdt_index_by_fid
llapi_msg_set_level
llapi_open_by_fid
llapi_parse_size
llapi_search_fsname
llapi_search_mounts
Cray would like to do the preliminary implementation of this new library, and would like to know if the copyright holders of the above mentioned functions agree to relicense their code under the LGPL. Otherwise they will have to be recoded. The second phase would be to also port Robinhood to it, since the most |
| Comments |
| Comment by Frank Zago (Inactive) [ 02/Dec/14 ] |
|
Somewhat related: |
| Comment by Gerrit Updater [ 13/Dec/14 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/13058 |
| Comment by Andreas Dilger [ 13/Dec/14 ] |
|
I pushed the http://review.whamcloud.com/13058 patch as a short-term work-around to make liblustreapi available with an LGPL license. The patch just changes the GPL license header to LGPL for liblustreapi.c and lustreapi.h. It will need an approval from someone at each contributing organization (I've CC'd someone senior from each organization on this ticket since not everyone has a Gerrit account). The newer parts of liblustreapi (_hsm, _json, _layout, _lease) already have an LGPL license, and do not need any changes. While I think that reimplementing liblustreapi from scratch under LGPL is good for a number of reasons, it is too late in the 2.7.0 release cycle to do this, and it is also undesirable to have major changes in the 2.5.x releases. This patch is only intended as a short term fix, and I welcome a reimplementation/rework of liblustreapi with open arms. This also opens the door to re-using any parts of the existing liblustreapi as needed, and/or allows a transition period while different parts of the library are replaced. Before I add more reviewers to the patch itself I want to ensure the patch has not failed any regression tests, to avoid the need for multiple reviews on the patch. I'll add the required reviewers once the patch has passed testing. |
| Comment by Christopher Morrone [ 15/Dec/14 ] |
|
I do not think that it is not particularly wise to make a change like this so late in the a release development cycle. It is likely to be difficult to get everyone's legal departments to OK a change like this in such a short time period. I think that we should drop 2.7 and 2.5 from consideration, and just focus on making a good LGPL library for some future version of Lustre. |
| Comment by Andreas Dilger [ 15/Dec/14 ] |
|
There is still a month before code freeze for 2.7.0, and another month after that before release, and since this isn't actually changing the code (just comments) I think it could be landed any time before release. I don't think that is an unreasonable timeframe to get this finished for 2.7.0. That is likely too long for 2.5.4, but would easily be "backported" for b2_5 when available (b2_5 is a strict subset of master so there would not need to be additional approval needed). As for writing a replacement LGPL library, I suspect that may take long enough that it may not even be ready for 2.8.0, just due to lots of feedback and discussion on what the various APIs should actually look like, development, inspection, testing, etc. |
| Comment by Cory Spitz [ 26/Jan/15 ] |
|
Andreas reported at the Jan '15 Open SFS LWG developer's meeting that there is more work to do to unlink other uses to avoid GPL incompatibility. Andreas (or anyone), would you please care to elaborate here so that we can begin to organize the work? Thank you. |
| Comment by Frank Zago (Inactive) [ 26/Jan/15 ] |
|
I think Andreas referred to these dependencies: $(L_IOCTL) $(L_KERNELCOMM) $(L_STRING) as well as all the code pulled in from lustre_idl.h and lustre_user.h $(L_IOCTL) doesn't appear to be needed for liblustreapi. $(L_KERNELCOMM) is logically split into kernel and user parts, so untangling that one should be easy. $(L_STRING) is needed, but should move to a libcompat.a or something similar. The biggest change is probably getting rid of lustre_idl.h in userspace. It's pulling in other headers. It seems not a lot of that is actually necessary in userspace. There's also some code in lustre_user.h that should move inside lustreapi directly. |
| Comment by Andreas Dilger [ 05/Feb/15 ] |
|
Frank has it pretty well described already. I thought l_ioctl() was used by liblustreapi, but it seems it is only used by lctl.c and lfs.c and does not need to be converted to LGPL unless we want to abstract some of the lctl functionality into llapi for other tools to use at some point in the future. There are a number of places where <libcfs/libcfs.h> is included, but I don't think that is needed for user tools at all, nor liblustreapi. It was mostly for liblustre and has a LOT of cruft in libcfs/include/libcfs/user-*.h and libcfs/libcfs/user-*.c that could just be deleted now that liblustre is gone. |
| Comment by James A Simmons [ 05/Feb/15 ] |
|
Removing libcfs user-* files will break a lot of things. Trust me on this one |
| Comment by Gerrit Updater [ 05/Aug/15 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/15857 |
| Comment by Andreas Dilger [ 21/Sep/15 ] |
|
There is some work in progress at https://github.com/fzago-cray/liblustre also. |
| Comment by Gerrit Updater [ 24/Sep/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/15857/ |
| Comment by Gerrit Updater [ 03/Oct/15 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/16721 |
| Comment by James A Simmons [ 16/Oct/15 ] |
|
Looks like it is time to discuss error handling. The older api handles errors using UNIX98 style of return -errno code were newer apis like the layout handles errors more like the standard C library does in that -1 is return and errno is set. As new LGPL code is written we have to pick which method should be done. As for backwards compatibility we of course need older api functions to keep UNIX98 behavior for now. |
| Comment by Frank Zago (Inactive) [ 16/Oct/15 ] |
|
Using errno to return an error has a big drawback for the caller, since errno will be clobbered by a caller to the libc (like printf). I approved that for the layout API, and I shouldn't have. |
| Comment by Christopher Morrone [ 16/Oct/15 ] |
|
There are going to be reasonable arguments on either side of the debate, and reasonable people can disagree. But we already had this debate last time around, and I would prefer that we not keep reviving it repeatedly. The decision has been made. Lets stick with it. UNIX98 is not particularly common usage in the core Linux userspace libraries. POSIX.1 and C99 are, on the other hand quite common (libc, et.). And lets be honest, the we weren't doing negative error codes before because they were like UNIX98, we were doing it because in Linux kernel space they realized that they could overload pointer return values with negative error codes because those parts of the address space were unused. And in the past we didn't much consider good interface design, and instead just did what we were used to doing in-kernel. |
| Comment by Robert Read (Inactive) [ 16/Oct/15 ] |
|
Chris, do you have a link for that previous decision? |
| Comment by Andreas Dilger [ 16/Oct/15 ] |
|
Robert, that was discussed in the context of Is there an objection to returning -ERRNO from a function in addition to setting errno=ERRNO? While a bit more cumbersome for the library developers, this allows users to interact with the library in either manner and avoids the (IMHO) common coding mistake of using the "-1" return value as an errno in the caller. |
| Comment by Robert Read (Inactive) [ 16/Oct/15 ] |
|
I'd prefer the conventional and simpler errno=ERRNO and return -1. |
| Comment by Christopher Morrone [ 17/Oct/15 ] |
|
Wait, James, where are you getting this idea that returning -errno is a UNIX98 thing? As far as I can tell (which granted is not far since the documents are not terribly public), UNIX98 codified setting errno for many things (e.g. malloc()), whereas prior to that it was inconsistent. I could certainly be wrong given how hard it is to find clear info, but I found a few instances like the malloc one and none that talked about returning -errno. The suggestion about returning both -ERRNO and errno=ERRNO was considered and ultimately decided against in the previous round of discussion as well, I think. I believe that the argument in favor of -1 is that it is what most of the core Linux libraries do. Some code authors take the short cut of checking for -1 by doing "if (rc < 0)", but it is arguably more correct to do "if (rc == -1)". By Lustre returning -ERRNO from its API, we introduce an inconsistency that those used to doing things the arguably more correct way will need to learn to do differently for Lustre. The thing about errno is that it is consistent regardless of the type that a function returns. Some functions return integers, some return pointers, some return structures, etc. When an integer is returned, it is the convention to signal an error has occurred by returning -1. When a function returns a pointer, it is conventional to return NULL to signal an error. For other things, it may vary, but needs to be clearly documented. But in all cases, the type of error is transmitted through errno, consistently. You can't have a consistent way of returning the error type through the integer return code unless you dictate all functions in the Lustre library must return integers. I don't think we want to give up the flexibility to return pointers and other things from Lustre library functions. Another common way of handling errors it to have a separate function argument (typically the last one) consistently added to all/most functions in a library (e.g. glib's GError). We didn't see the need for any additional error codes beyond what is found in errno.h so we didn't do that either. |
| Comment by Frank Zago (Inactive) [ 23/Oct/15 ] |
|
The problem with returning -1 and errno is that it introduces, by construction, bugs in both the caller and the callee. Just look at the existing Lustre history and search for errno. You'll find fixes just for that. For instance commits f6c886e3, 893b6771 or df4b3580 for a whole series of fixes. The aforementioned layout patch had its own share of these bugs when it was developed. Just look at the patches history on gerrit. Also, gcc can warn if there is a missing return, or if the return code hasn't been initialized. That's not happening when errno is not set or overwritten. Plus it makes the code more complex because one has to keep track of 2 variables instead of one, both in the caller and callee. |
| Comment by Christopher Morrone [ 23/Oct/15 ] |
|
I fully understand your argument and sympathize with your dislike of that coding gotcha. But I think that is just one of the things that any C programmer in Linux user space needs to understand an accept. Some of the surrounding code in the commit examples you supplied show calls to things like ftruncate(). Nothing (reasonable) that we do in lustre will save programmers from having this problem with ftruncate() and all the other standard library calls that work exactly like this. We probably just aren't going to agree on this particular topic. |
| Comment by Frank Zago (Inactive) [ 26/Oct/15 ] |
|
So, there's already so many ways to shoot yourself in the foot with Lustre. What's one more? |
| Comment by Frank Zago (Inactive) [ 12/Nov/15 ] |
|
I've updated the code at https://github.com/fzago-cray/liblustre. That library is now fully LGPLed. It's been successfully used by Cray's copytool for several months without trouble. |
| Comment by Christopher Morrone [ 12/Nov/15 ] |
|
It would be really nice for Cray's library to not select a generic "liblustre" name for its library. Maybe use "libcraylustre" or something. |
| Comment by James A Simmons [ 12/Nov/15 ] |
|
I assume that work will find its way into the current liblustreapi.so. Once |
| Comment by Frank Zago (Inactive) [ 12/Nov/15 ] |
|
Christopher, Cray would like to have this library eventually adopted by the community and not stay a Cray only project. At this stage it's still a proposal. James, that can't happen. When you mix GPL+LGPL code, the result is GPL, which is exactly what we don't want. Also, liblustreapi is never going to be LGPL, which is why we started this effort. |
| Comment by Christopher Morrone [ 12/Nov/15 ] |
Ah, when you said that Cray's copytool is using it, I got the impression that you are using this on customers' systems. If it is still in proposal mode and not in the wild, then no harm, no foul. If you can't get over your dislike of the Lustre/POSIX way of handling errors and make it consistent with the existing LGPL parts of the API, that could be a deal breaker. It would be a shame to let all of the good work go to waste over something like that. Not only that, right now your API seems strangely internally inconsistent in its error approach, at least judging by what the man pages say. One function works the POSIX way using errno, others don't...I don't get it. |
| Comment by Cory Spitz [ 18/Nov/15 ] |
|
Should we plan to add Frank's LGPL liblustre in the tree alongside the GPL liblustreapi? Then, when he pushes to Gerrit you all can discuss the functions and error handling in code review? |
| Comment by Andreas Dilger [ 18/Nov/15 ] |
|
The easiest way to do that would be to add it as liblustreapi2, as is done with many other libraries. I don't think it makes sense to do a new .so version, since the API (AFAIK, haven't looked at it) will be different for many functions. It should be built with proper .so versioning itself from the beginning. This way liblustreapi and liblustreapi2 can coexist in the tree for some time as needed, though they could both link in the existing LGPL code. |
| Comment by Robert Read (Inactive) [ 18/Nov/15 ] |
|
I would like to see a new library, too, but I would prefer to see a simpler library with just the base primitives exposed by Lustre, and not the higher level abstract interface provided by Frank's library. |
| Comment by Frank Zago (Inactive) [ 18/Nov/15 ] |
|
What are your reasons? |
| Comment by Robert Read (Inactive) [ 18/Nov/15 ] |
|
We're using this library in a different language (Go), and we've already built this style of abstraction layer in our target language. I'd rather implement this on top of the simpler primitives instead of adding another layer of abstraction. |
| Comment by Andreas Dilger [ 19/Nov/15 ] |
|
Robert, isn't that an argument for keeping the complexity inside liblustreapi? That avoids the need to develop such an abstraction for every user/interface to this code (e.g. Python, Perl, C applications, etc). |
| Comment by Robert Read (Inactive) [ 19/Nov/15 ] |
|
If the complexity is actually needed, then maybe, but the existing ball of mud in liblustreapi is a great example of pushing too much complexity into a library. I'd prefer for liblustreapi to stay as low-level as possible, and allow application developers build the higher level abstractions that make sense for their applications. Libraries like Frank's are still very useful, but they should be external to the Lustre tree and not be the interface Lustre provides. |
| Comment by Cory Spitz [ 19/Nov/15 ] |
|
An argument for going with liblustre, of course, is that users can link via -llustre. |
| Comment by Cory Spitz [ 19/Nov/15 ] |
|
Robert, if we are going to continue to refactor things in order to develop base primitives would it be ok to have two libraries in the Lustre tree? Or, would it be so bad to keep a single LGPL library that contains the complexity you don't want. You don't have to use the complex routines in your applications as long as the lib contains all the base primitives you want. If things are more cleanly implemented it shouldn't be a problem. |
| Comment by Robert Read (Inactive) [ 19/Nov/15 ] |
|
It seems to me we have a good opportunity here to refactor our user space utilities out of the Lustre tree. If we define a clean kernel-user interface, then our user space tools won't be dependent on Lustre internals, and we can move all the utilities into their own repository. This will make tool development easier since a tool-only patch would not require a full build and test of the entire Lustre stack, as it does today. The faster development cycle may encourage experimentation and new tool development. It would also provide an easier entry path for new developers interested in contributing to Lustre, but intimidated with the level of effort require to contribute even small changes to the core tree. (A level of effort which is certainly understandable for kernel modules, but perhaps not as much for more fungible utilities.) If this is not a desirable goal we, as a community, want to work towards, then I agree that it makes sense to provide both the base primitives and abstract interface in a single library in the tree, and I'd be OK with that. On the other hand, if this is desirable then perhaps this ticket should be about providing the right LGLP library so Frank's library can continue to to develop externally instead of being incorporated. |
| Comment by Christopher Morrone [ 23/Nov/15 ] |
|
I certainly like the general approach suggested by Robert. I have not yet formed an opinion on whether Frank's library is too high level or not, because most of it currently concerns HSM and that is an area of Lustre about which I know very little. I think there are probably lots of great side effects of separating the kernel and user space code. The build and packaging systems could potentially be cleaned up and simplified quite a bit, for instance. |
| Comment by Andreas Dilger [ 23/Dec/15 ] |
|
Unfortunately, it seems that while there is some interest in a major refactoring of both the kernel interfaces and library, there is little motivation from that camp to actually work on such a large project. In the meantime, it doesn't make sense to leave Cray's library gathering dust when it could get us out of the situation we are in with the GPL userspace library that cannot be used by non-GPL code such as HSM copytools. What I don't want is continued inaction for months and years discussing a perfect solution when we have something usable today that solves a real problem and can always be improved in the future. I would be in support of submitting the Cray code to the tree, and the proponents of reworking the whole library and kernel interfaces can still pursue that avenue as their schedules permit (this should also take into account the upstream kernel's request to refactor the Lustre ioctls in general). The existing liblustreapi code can be adapted to work with the newer interfaces as they arrive, so that user tools using this library continue to function across releases while the lower level interfaces are reworked, and existing applications can move over to the new API after it is well established. |
| Comment by Gerrit Updater [ 14/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16721/ |
| Comment by Gerrit Updater [ 15/Jun/16 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/20809 |
| Comment by Gerrit Updater [ 20/Jun/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20809/ |
| Comment by Gerrit Updater [ 08/Feb/17 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/25324 |
| Comment by Gerrit Updater [ 15/Feb/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/25324/ |
| Comment by Gerrit Updater [ 16/Mar/17 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26047 |
| Comment by Gerrit Updater [ 26/Mar/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26047/ |
| Comment by Gerrit Updater [ 07/Apr/17 ] |
|
Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/26440 |
| Comment by Gerrit Updater [ 13/Jun/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/26440/ |
| Comment by Cory Spitz [ 18/Aug/17 ] |
|
Do we want to get this resolved for 2.11? |
| Comment by Andreas Dilger [ 18/Aug/17 ] |
|
Cory, we are now in a good position to complete this work. James has cleaned up the headers to separate the kernel and userspace code, so what is left is to generate a separate patch for each of the files to change the license to LGPL (rather than a single patch for all of them, as I naively tried) and then get a signoff for each patch from all sites that have contributed any substantial changes to those files. I will of course approve for any Intel and WC contributors, and you can now cover Cray, Seagate, Xyratex, Oracle, Sun, and CFS contributions, which leaves mostly CEA, LLNL, ORNL, and a few others. |
| Comment by Cory Spitz [ 12/Apr/18 ] |
|
adilger, I've got some bad news – Cray didn't acquire the copyrights for contributions made by CFS, Sun, Oracle, ClusterStor, Xyratex, or Seagate and I don't know if Oracle gave Seagate their rights when they sold of their parts of it. I'll do some more checking, but we will definitely have to go to Seagate and possibly Oracle to continue down this path. I'm all for trying though. |