[LU-3840] llapi_layout API design discussion - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9942

Description

Extensions to liblustreapi have been posted for review here.

http://review.whamcloud.com/5302

The goal of the API extensions is to provided a user-friendly interface for interacting with the layout of files in Lustre filesystems, and to hide the wire-protocol details behind an opaque data type.

This issue will provide a forum for potentials users and developers to discuss and critique the design of the proposed API extensions.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

llapi_layout_alloc.txt
2 kB
04/Aug/14 6:22 PM
llapi_layout_file_create.txt
2 kB
04/Aug/14 6:22 PM
llapi_layout_get_by_fd.txt
4 kB
04/Aug/14 6:24 PM
llapi_layout_ost_index_get.txt
2 kB
04/Aug/14 6:22 PM
llapi_layout_pattern_get.txt
1 kB
04/Aug/14 6:22 PM
llapi_layout_pool_name_get.txt
2 kB
04/Aug/14 6:22 PM
llapi_layout_stripe_count_get.txt
1 kB
04/Aug/14 6:22 PM
llapi_layout_stripe_size_get.txt
1 kB
04/Aug/14 6:22 PM
llapi_layout.txt
4 kB
04/Aug/14 6:22 PM

Issue Links

is related to

LU-4665 utils: lfs setstripe to specify OSTs

Resolved

is related to

LU-2182 Add llapi_file_get_layout() function in liblustreapi

Closed

LU-3480 Layout Enhancement

Closed

Activity

[LU-3840] llapi_layout API design discussion

Christopher Morrone (Inactive) added a comment - 04/Sep/14 11:39 PM

Because it would keep the same API that the rest of liblustreapi has today, and there is no drawback to doing so.

The old API is seriously deficient in its design. Yes, we need to keep it around for some time, but I don't see a great deal of value in maintaining compatibility with poor design. This is our chance to make a clean break and make something that applications can really use.

Virtually every application I've seen checks if (rc < 0) instead of if (rc == -1), so as it is clearly documented in the man pages that the functions return a negative value on error instead of "-1" there shouldn't be any problem.

Granted, that is not uncommon. But the man page usually says:

On error, -1 is returned, and errno is set appropriately.

So it is perhaps not unreasonable to guess there are are also folks out there that are used to explicitly checking for -1. Lustre returning only -1 could be considered the least likely to trip anyone up because it will work for checks of either rc == -1 or rc < 0.

Christopher Morrone (Inactive) added a comment - 04/Sep/14 11:39 PM Because it would keep the same API that the rest of liblustreapi has today, and there is no drawback to doing so. The old API is seriously deficient in its design. Yes, we need to keep it around for some time, but I don't see a great deal of value in maintaining compatibility with poor design. This is our chance to make a clean break and make something that applications can really use. Virtually every application I've seen checks if (rc < 0) instead of if (rc == -1), so as it is clearly documented in the man pages that the functions return a negative value on error instead of "-1" there shouldn't be any problem. Granted, that is not uncommon. But the man page usually says: On error, -1 is returned, and errno is set appropriately. So it is perhaps not unreasonable to guess there are are also folks out there that are used to explicitly checking for -1. Lustre returning only -1 could be considered the least likely to trip anyone up because it will work for checks of either rc == -1 or rc < 0 .

Andy Nelson added a comment - 29/Aug/14 8:34 PM

I fully understand the constraints of backwards compatibility and "can't get rid of that yet" cruft. That was very much the point of my previous comment: I've got to carry around backwards compatibility stuff for the different lustre api's until such time as the new one is available on the oldest machine I have to compile code on. I'm expecting that I can remove the "old API" functionality from my own code in something like 5 years or so, if the "new API" stuff lands in a distributed version right now. And that isn't even the case yet, so the date keeps getting pulled further and further out.

An example of that pain is the lustre_idl.h file, which I have to hack by hand and carry around with my code since it doesn't compile in user space and I need stuff out of there. The example codes in the lustre manual include it for example too. All the more complicated since that file changes in odd ways across different versions and I can't just use a 1.8x lustre_idl.h file on a lustre 2.x installation and expect it to work...

So this is why my emphasis on the "start now" and get the clock ticking. As far as new api format (errno and such discussions), we've gone over that, but here is a reiteration: My position as an applications developer is to have one way to access the error information, and for that way to be the errno setting stuff.

Andy Nelson added a comment - 29/Aug/14 8:34 PM I fully understand the constraints of backwards compatibility and "can't get rid of that yet" cruft. That was very much the point of my previous comment: I've got to carry around backwards compatibility stuff for the different lustre api's until such time as the new one is available on the oldest machine I have to compile code on. I'm expecting that I can remove the "old API" functionality from my own code in something like 5 years or so, if the "new API" stuff lands in a distributed version right now. And that isn't even the case yet, so the date keeps getting pulled further and further out. An example of that pain is the lustre_idl.h file, which I have to hack by hand and carry around with my code since it doesn't compile in user space and I need stuff out of there. The example codes in the lustre manual include it for example too. All the more complicated since that file changes in odd ways across different versions and I can't just use a 1.8x lustre_idl.h file on a lustre 2.x installation and expect it to work... So this is why my emphasis on the "start now" and get the clock ticking. As far as new api format (errno and such discussions), we've gone over that, but here is a reiteration: My position as an applications developer is to have one way to access the error information, and for that way to be the errno setting stuff.

James A Simmons added a comment - 29/Aug/14 6:09 PM

I will hold off on my patches until this is resolved. I sent out a email to some people in our applications department to see what they say. I will report on that feedback.

James A Simmons added a comment - 29/Aug/14 6:09 PM I will hold off on my patches until this is resolved. I sent out a email to some people in our applications department to see what they say. I will report on that feedback.

Andreas Dilger added a comment - 29/Aug/14 3:57 PM

Andy, while I'm all for updating applications to the new API, yours is not the only application in the world that uses it, so we can't remove the old API very quickly. Also, this new API has only just been landed into a development branch and would need to be backported into the maintenance releases before it even has a chance to be used by regular users.

I think the best that can be done is to include this into all of the maintenance releases, update the old APIs to use this new code (James has started on that) and then it can be deprecated in a few years after it is available in those releases. It is also possible to mark those functions as deprecated in the headers so that application developers learn this before the API is removed. It would also make sense to update the user manual once this API is available in the maintenance releases.

Andreas Dilger added a comment - 29/Aug/14 3:57 PM Andy, while I'm all for updating applications to the new API, yours is not the only application in the world that uses it, so we can't remove the old API very quickly. Also, this new API has only just been landed into a development branch and would need to be backported into the maintenance releases before it even has a chance to be used by regular users. I think the best that can be done is to include this into all of the maintenance releases, update the old APIs to use this new code (James has started on that) and then it can be deprecated in a few years after it is available in those releases. It is also possible to mark those functions as deprecated in the headers so that application developers learn this before the API is removed. It would also make sense to update the user manual once this API is available in the maintenance releases.

Andy Nelson added a comment - 29/Aug/14 3:14 PM

I am all for abandoning the old api as soon as possible. It is a maintenance nightmare for my applicaiton code to handle. There is ugly and otherwise unneeded ifdef goo for different machines all over the place. Please dump that api. What Ned says is exactly right: there is a chance to do it right/better this time around.

As far as "negative value returned" vs "specific negative value returned (i.e. rc=-ETHE_ERROR_CODE), again, this leads to confusion and complexity that is just not needed. What happens when someone makes a commit sometime and gets them out of sync, such that the return code and errno are not the same any more, for example? Yes, thats a bug that
someone introduced, but that is the whole point. You can remove that bug years ahead of time by making it impossible to do that by design.

Andy Nelson added a comment - 29/Aug/14 3:14 PM I am all for abandoning the old api as soon as possible. It is a maintenance nightmare for my applicaiton code to handle. There is ugly and otherwise unneeded ifdef goo for different machines all over the place. Please dump that api. What Ned says is exactly right: there is a chance to do it right/better this time around. As far as "negative value returned" vs "specific negative value returned (i.e. rc=-ETHE_ERROR_CODE), again, this leads to confusion and complexity that is just not needed. What happens when someone makes a commit sometime and gets them out of sync, such that the return code and errno are not the same any more, for example? Yes, thats a bug that someone introduced, but that is the whole point. You can remove that bug years ahead of time by making it impossible to do that by design.

Ned Bass (Inactive) added a comment - 28/Aug/14 11:36 PM

so as it is clearly documented in the man pages that the functions return a negative value on error instead of "-1" there shouldn't be any problem.

If the man page merely specifies a negative value on error, than a properly written application shouldn't rely on the specific value returned. Therefore in order for this behavior to be useful, it most become a formal part of the API and documented as such. Simplicity is arguably the most important aspect of a well-designed API, and having two alternative means of returning error codes would add complexity for little gain. I'm not persuaded we should do it just because the legacy llapi functions do it. We should see this as an opportunity to get things right this time, not to carry over all the old baggage.

Ned Bass (Inactive) added a comment - 28/Aug/14 11:36 PM so as it is clearly documented in the man pages that the functions return a negative value on error instead of "-1" there shouldn't be any problem. If the man page merely specifies a negative value on error, than a properly written application shouldn't rely on the specific value returned. Therefore in order for this behavior to be useful, it most become a formal part of the API and documented as such. Simplicity is arguably the most important aspect of a well-designed API, and having two alternative means of returning error codes would add complexity for little gain. I'm not persuaded we should do it just because the legacy llapi functions do it. We should see this as an opportunity to get things right this time, not to carry over all the old baggage.

Andreas Dilger added a comment - 28/Aug/14 11:11 PM

Because it would keep the same API that the rest of liblustreapi has today, and there is no drawback to doing so. Virtually every application I've seen checks if (rc < 0) instead of if (rc == -1), so as it is clearly documented in the man pages that the functions return a negative value on error instead of "-1" there shouldn't be any problem. That allows applications to choose which behaviour they want to use for programming, and I don't see it introducing any significant complexity into the library - at worst it would mean return -errno instead of return -1 in some places.

Andreas Dilger added a comment - 28/Aug/14 11:11 PM Because it would keep the same API that the rest of liblustreapi has today, and there is no drawback to doing so. Virtually every application I've seen checks if (rc < 0) instead of if (rc == -1) , so as it is clearly documented in the man pages that the functions return a negative value on error instead of "-1" there shouldn't be any problem. That allows applications to choose which behaviour they want to use for programming, and I don't see it introducing any significant complexity into the library - at worst it would mean return -errno instead of return -1 in some places.

Ned Bass (Inactive) added a comment - 28/Aug/14 7:13 PM

If there is an insistence on setting errno to return errors, it would still be possible to also return the negative errno instead of "-1" all the time.

We could, but what would be the benefit? It would be awkward and potentially confusing for the API to specify more than one authoritative source of error codes. And as Andy points out, applications using both standard library and llapi_layout calls would prefer to use common error handling constructs.

Ned Bass (Inactive) added a comment - 28/Aug/14 7:13 PM If there is an insistence on setting errno to return errors, it would still be possible to also return the negative errno instead of "-1" all the time. We could, but what would be the benefit? It would be awkward and potentially confusing for the API to specify more than one authoritative source of error codes. And as Andy points out, applications using both standard library and llapi_layout calls would prefer to use common error handling constructs.

Andy Nelson added a comment - 28/Aug/14 4:13 PM

FWIW, I have already implemented the version of the api that uses errno stuff, and put strerror calls in various
error paths. For example:

rc = llapi_layout_stripe_count_get(layout,&num_comps ); if(rc!=0){*ierr=-1;goto writeerrout;};

...

writeerrout:
sprintf(cerr,"Lustre layout definition error: %s\n",strerror(errno));

This api is consistent with how I do things in other parts of the code as well, e.g. with stat calls and such.

As an implementer of userland code that uses the llapi functionality, I strongly prefer the errno approach.

Andy Nelson added a comment - 28/Aug/14 4:13 PM FWIW, I have already implemented the version of the api that uses errno stuff, and put strerror calls in various error paths. For example: rc = llapi_layout_stripe_count_get(layout,&num_comps ); if(rc!=0){*ierr=-1;goto writeerrout;}; ... writeerrout: sprintf(cerr,"Lustre layout definition error: %s\n",strerror(errno)); This api is consistent with how I do things in other parts of the code as well, e.g. with stat calls and such. As an implementer of userland code that uses the llapi functionality, I strongly prefer the errno approach.

Andreas Dilger added a comment - 28/Aug/14 7:16 AM

If there is an insistence on setting errno to return errors, it would still be possible to also return the negative errno instead of "-1" all the time. I'm not suggesting to return PTR_ERR() instead of NULL in case of memory allocation failures, as I agree that this is not very common for userspace programs.

Andreas Dilger added a comment - 28/Aug/14 7:16 AM If there is an insistence on setting errno to return errors, it would still be possible to also return the negative errno instead of "-1" all the time. I'm not suggesting to return PTR_ERR() instead of NULL in case of memory allocation failures, as I agree that this is not very common for userspace programs.

Christopher Morrone (Inactive) added a comment - 28/Aug/14 1:01 AM - edited

IMHO, "errno" is the domain of the kernel and libc and should not be used by application libraries.

I think that the lustre library should be considered a system library, not an "application library". From a normal application's perspective, the lustre library is as system as they come: you are using library the interacts on your behalf directly with the kernel to influence a service offered by the kernel. In that respect I would argue that errno is entirely reasonable to use.

I would argue that this kind of error handling is exactly what user-space C developers have come to expect from system level libraries. After all, if it isn't OK for use to use errno, is it really OK for us to reuse all of the standard error codes (EIO, EINVAL, etc.)? Shouldn't we have to invent our own error names and values if those things are only the purview of the kernel and Lib C?

Granted, using temporary variable to implement the use of error is mildly annoying. But is that really enough justification to violate the principle of least surprise for the user-space developers who will be consuming our library functions?

I think that the "Return negated errno values" approach is probably the least desirable of those proposed. This is a kernel-ism; the result of a clever hack that recognized that those memory values would never be valid so hey why not throw the error value in there. In user space, the programmers are going to think we have lost our minds if we force them to check for an negative version of an error code that is always positive everywhere else. At the very least, we would need to create macros or functions that the users would need to use to check the return code and another to translate the error code into the correct value. We would be shifting the annoying error code shuffling from the library writer to all of the library consumers.

errno is defined by the C language standard. Most, if not all, of the values of errno that we use (EACCESS, EAGAIN, EIO, EISDIR, etc.) are defined by POSIX.1-2001 or C99, not inventions of the Linux kernel.

If we use the same values as errno, users are going to want to use standard functions like perror() that assume the use of errno. Granted, strerror() also exists, but it is more difficult to use than perror().

It is already difficult to get users to check error codes, so I think it is important to keep things simple when reasonable to do so.

Christopher Morrone (Inactive) added a comment - 28/Aug/14 1:01 AM - edited IMHO, "errno" is the domain of the kernel and libc and should not be used by application libraries. I think that the lustre library should be considered a system library, not an "application library". From a normal application's perspective, the lustre library is as system as they come: you are using library the interacts on your behalf directly with the kernel to influence a service offered by the kernel. In that respect I would argue that errno is entirely reasonable to use. I would argue that this kind of error handling is exactly what user-space C developers have come to expect from system level libraries. After all, if it isn't OK for use to use errno, is it really OK for us to reuse all of the standard error codes (EIO, EINVAL, etc.)? Shouldn't we have to invent our own error names and values if those things are only the purview of the kernel and Lib C? Granted, using temporary variable to implement the use of error is mildly annoying. But is that really enough justification to violate the principle of least surprise for the user-space developers who will be consuming our library functions? I think that the "Return negated errno values" approach is probably the least desirable of those proposed. This is a kernel-ism; the result of a clever hack that recognized that those memory values would never be valid so hey why not throw the error value in there. In user space, the programmers are going to think we have lost our minds if we force them to check for an negative version of an error code that is always positive everywhere else. At the very least, we would need to create macros or functions that the users would need to use to check the return code and another to translate the error code into the correct value. We would be shifting the annoying error code shuffling from the library writer to all of the library consumers. errno is defined by the C language standard. Most, if not all, of the values of errno that we use (EACCESS, EAGAIN, EIO, EISDIR, etc.) are defined by POSIX.1-2001 or C99, not inventions of the Linux kernel. If we use the same values as errno, users are going to want to use standard functions like perror() that assume the use of errno. Granted, strerror() also exists, but it is more difficult to use than perror(). It is already difficult to get users to check error codes, so I think it is important to keep things simple when reasonable to do so.

People

Assignee:: Ned Bass (Inactive)

Reporter:: Ned Bass (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 27/Aug/13 5:07 PM

Updated:: 03/Feb/15 7:14 PM

Resolved:: 03/Feb/15 7:14 PM