[LU-3840] llapi_layout API design discussion - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9942

Description

Extensions to liblustreapi have been posted for review here.

http://review.whamcloud.com/5302

The goal of the API extensions is to provided a user-friendly interface for interacting with the layout of files in Lustre filesystems, and to hide the wire-protocol details behind an opaque data type.

This issue will provide a forum for potentials users and developers to discuss and critique the design of the proposed API extensions.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

llapi_layout_alloc.txt
2 kB
04/Aug/14 6:22 PM
llapi_layout_file_create.txt
2 kB
04/Aug/14 6:22 PM
llapi_layout_get_by_fd.txt
4 kB
04/Aug/14 6:24 PM
llapi_layout_ost_index_get.txt
2 kB
04/Aug/14 6:22 PM
llapi_layout_pattern_get.txt
1 kB
04/Aug/14 6:22 PM
llapi_layout_pool_name_get.txt
2 kB
04/Aug/14 6:22 PM
llapi_layout_stripe_count_get.txt
1 kB
04/Aug/14 6:22 PM
llapi_layout_stripe_size_get.txt
1 kB
04/Aug/14 6:22 PM
llapi_layout.txt
4 kB
04/Aug/14 6:22 PM

Issue Links

is related to

LU-4665 utils: lfs setstripe to specify OSTs

Resolved

is related to

LU-2182 Add llapi_file_get_layout() function in liblustreapi

Closed

LU-3480 Layout Enhancement

Closed

Activity

[LU-3840] llapi_layout API design discussion

Ned Bass (Inactive) added a comment - 16/Jul/14 11:29 PM

That is one reason I want to start over from patch set 23. It was pretty thoroughly reviewed at that stage so I'd like to see it land with only minor changes and do follow-up work in separate patches. I don't see much benefit to breaking up what's already been reviewed though. The main obstacle to landing it is the expensive validity checks that Andreas objects to. But those checks can simply be removed for now, and replaced with improved interfaces as discussed above in subsequent patches. How would that sit with you, Andreas?

Ned Bass (Inactive) added a comment - 16/Jul/14 11:29 PM That is one reason I want to start over from patch set 23. It was pretty thoroughly reviewed at that stage so I'd like to see it land with only minor changes and do follow-up work in separate patches. I don't see much benefit to breaking up what's already been reviewed though. The main obstacle to landing it is the expensive validity checks that Andreas objects to. But those checks can simply be removed for now, and replaced with improved interfaces as discussed above in subsequent patches. How would that sit with you, Andreas?

James A Simmons added a comment - 16/Jul/14 10:42 PM - edited

Okay. I will take all the changes I did in 5302 and place it in the ~~LU-4665~~ patch. Andreas we will need to submit a different patch on top of Ned's new patch with your idea of a llapi_layout_verify. That is assuming people will accept your idea. Ned new base patch will be fine since it will not be the back end of anything. Further testing of the ~~LU-4665~~, which will use the layout api with lfs getstripe and setstripe, on my part will expose any problems with Ned's design. Plus the new base patch will not handle the case of DNE directories so that work will have to be developed as well. Much work to be done.

I promise to break the work I do in ~~LU-4665~~ into small incremental patch so people can properly review them.

Perhaps Ned you could consider breaking your patch up into smaller pieces. It is a lot of work.

James A Simmons added a comment - 16/Jul/14 10:42 PM - edited Okay. I will take all the changes I did in 5302 and place it in the LU-4665 patch. Andreas we will need to submit a different patch on top of Ned's new patch with your idea of a llapi_layout_verify. That is assuming people will accept your idea. Ned new base patch will be fine since it will not be the back end of anything. Further testing of the LU-4665 , which will use the layout api with lfs getstripe and setstripe, on my part will expose any problems with Ned's design. Plus the new base patch will not handle the case of DNE directories so that work will have to be developed as well. Much work to be done. I promise to break the work I do in LU-4665 into small incremental patch so people can properly review them. Perhaps Ned you could consider breaking your patch up into smaller pieces. It is a lot of work.

Ned Bass (Inactive) added a comment - 16/Jul/14 10:04 PM

Yes. John's comments cut to the heart of the matter are in line with what Andy has been asking for. The ultimate test of a layout's validity is whether Lustre accepts and creates it. But the API should provide interfaces that allow the user to determine valid layout values.

At this point I think http://review.whamcloud.com/#/c/5302/ should be abandoned and replaced with a new change ID based on patch set 23. That was the last stable point in the revision series (aside from errno stomping bugs caught by Frank), and there has been too much churn since then for me to review the changes with any confidence. Future revisions should be based strictly on design changes agreed to here, and avoid unnecessary refactoring of code that has already been reviewed, tested, and debugged.

For now, I respectfully request to be the only person to push changes to the review. Others are of course welcome to contribute dependent patches as separate reviews. I'm grateful that James took the initiative to move this forward, but I think it's important for consistency's sake to just have one chef in the kitchen. Please communicate any new requirements for ~~LU-4665~~ integration here so that work can move forward.

To summarize, I propose that we do the following, in order.

Abandon change 5302
Submit a new gerrit review based on 5302 patch set 23
Identify the questions we need the API to answer as suggested by John
Post draft man pages for new interfaces for review here, revise as needed
Refresh the patch with implementation of new interfaces, along with complete test cases and documentation

Are others on board with this plan?

Ned Bass (Inactive) added a comment - 16/Jul/14 10:04 PM Yes. John's comments cut to the heart of the matter are in line with what Andy has been asking for. The ultimate test of a layout's validity is whether Lustre accepts and creates it. But the API should provide interfaces that allow the user to determine valid layout values. At this point I think http://review.whamcloud.com/#/c/5302/ should be abandoned and replaced with a new change ID based on patch set 23. That was the last stable point in the revision series (aside from errno stomping bugs caught by Frank), and there has been too much churn since then for me to review the changes with any confidence. Future revisions should be based strictly on design changes agreed to here, and avoid unnecessary refactoring of code that has already been reviewed, tested, and debugged. For now, I respectfully request to be the only person to push changes to the review. Others are of course welcome to contribute dependent patches as separate reviews. I'm grateful that James took the initiative to move this forward, but I think it's important for consistency's sake to just have one chef in the kitchen. Please communicate any new requirements for LU-4665 integration here so that work can move forward. To summarize, I propose that we do the following, in order. Abandon change 5302 Submit a new gerrit review based on 5302 patch set 23 Identify the questions we need the API to answer as suggested by John Post draft man pages for new interfaces for review here, revise as needed Refresh the patch with implementation of new interfaces, along with complete test cases and documentation Are others on board with this plan?

John Hammond added a comment - 16/Jul/14 8:17 PM

Let me rephrase. What are the use cases of validity checking in user space? I'm in favor of good interfaces that answer specific questions like: "How many stripes can I have?" or "What are the minimum and maximum stripe sizes?" These are easy to answer.

On the other hand, offering an API to answer "Is this striping valid?" makes me a bit uncomfortable. It's a bit like being asked "Where babies come from?" by someone else's kid. There are too many details to ensure that striping that passes this function will be always be accepted and created by Lustre.

Setting implementation aside, how do I use such a function? Do I create various hypothetical striping and pass them to verify? This seems like a parameter search to answer the questions from the first paragraph.

John Hammond added a comment - 16/Jul/14 8:17 PM Let me rephrase. What are the use cases of validity checking in user space? I'm in favor of good interfaces that answer specific questions like: "How many stripes can I have?" or "What are the minimum and maximum stripe sizes?" These are easy to answer. On the other hand, offering an API to answer "Is this striping valid?" makes me a bit uncomfortable. It's a bit like being asked "Where babies come from?" by someone else's kid. There are too many details to ensure that striping that passes this function will be always be accepted and created by Lustre. Setting implementation aside, how do I use such a function? Do I create various hypothetical striping and pass them to verify? This seems like a parameter search to answer the questions from the first paragraph.

Ned Bass (Inactive) added a comment - 16/Jul/14 6:53 PM

I have to voice a concern that lack of a formal design process is threatening to derail this API. There is too much implementation going on with too little understanding of requirements, as John hinted at. There should be one architect/implementer to preserve unity of design and conceptual integrity, with others providing feedback, requirements, and code review. New public interfaces such as llapi_layout_verify() should be fully documented in man page format and reviewed here before an implementation is submitted for review in gerrit.

So, let's precisely nail down exactly what our validity checking requirements are before charging ahead with implementation. We need concrete answers to at least:

Is the return status of fsetxattr() a sufficient indicator of validity?
If supplemental validity checking is needed (beyond fsetxattr()), how much control over that checking needs to be exposed through the API?
What specific error conditions need to be communicated?

Ned Bass (Inactive) added a comment - 16/Jul/14 6:53 PM I have to voice a concern that lack of a formal design process is threatening to derail this API. There is too much implementation going on with too little understanding of requirements, as John hinted at. There should be one architect/implementer to preserve unity of design and conceptual integrity, with others providing feedback, requirements, and code review. New public interfaces such as llapi_layout_verify() should be fully documented in man page format and reviewed here before an implementation is submitted for review in gerrit. So, let's precisely nail down exactly what our validity checking requirements are before charging ahead with implementation. We need concrete answers to at least: Is the return status of fsetxattr() a sufficient indicator of validity? If supplemental validity checking is needed (beyond fsetxattr()), how much control over that checking needs to be exposed through the API? What specific error conditions need to be communicated?

James A Simmons added a comment - 16/Jul/14 6:10 PM

The llot_magic is still there. Users will always find a way to break things :-/

James A Simmons added a comment - 16/Jul/14 6:10 PM The llot_magic is still there. Users will always find a way to break things :-/

Ned Bass (Inactive) added a comment - 16/Jul/14 6:03 PM

At some point I added a llot_magic canary field to use as a minimal sanity check. But I somehow lost that change in all the refreshes. I think we should add a magic field, but I'm opposed to further measures such as checksums to detect or prevent corruption. The application is ultimately responsible for not trashing its memory.

Ned Bass (Inactive) added a comment - 16/Jul/14 6:03 PM At some point I added a llot_magic canary field to use as a minimal sanity check. But I somehow lost that change in all the refreshes. I think we should add a magic field, but I'm opposed to further measures such as checksums to detect or prevent corruption. The application is ultimately responsible for not trashing its memory.

John Hammond added a comment - 16/Jul/14 5:55 PM

What is the use case of llapi_layout_verfiy()?

John Hammond added a comment - 16/Jul/14 5:55 PM What is the use case of llapi_layout_verfiy()?

James A Simmons added a comment - 16/Jul/14 5:33 PM - edited

Okay my local patch uses a path argument :

int llapi_layout_verify(const struct llapi_layout *layout, const char *path)

For llapi_file_open what I'm doing currently is

{
       /* If the user verifed the layout we still need to ensure that the
         * requested file is located on the same lustre file system as the
         * layout. */
        if (strlen(layout->llot_fsname) != 0) {
                /* Verify that the file path belongs to a lustre filesystem. */
                rc = llapi_search_fsname(path, fsname);
                if (rc < 0 || (strlen(fsname) == 0)) {
                        errno = ENOTTY;
                        return -1;
                }

                /* Verify the file system of the user supplied path is the 
                 * same one the layout was created for. */
                if (strcmp(fsname, layout->llot_fsname)) {
                        errno = ENOTTY;
                        return -1;
                }
        } else if (llapi_layout_verify(layout, path) < 0)
                return -1;

So it forces a verify if it doesn't see llot_fsname. If verified we still need to make sure that the file is located on the same file system. A subdirectory could be another file system mount. So I handle my fsname paranoia.

As for corruption, well that is harder to deal with. For checksum we have to write a check sum algothrim since userland libcfs is going away. Also with that small amount of data their is greater chance of checksum value collisons. Maybe we could get away with a hashtable. Again with libcfs going away that might be more challanging. Have to see if we can use just the header. Currently struct llapi_layout is opaque but I guess nothing stops the user from casting and scribbling.

James A Simmons added a comment - 16/Jul/14 5:33 PM - edited Okay my local patch uses a path argument : int llapi_layout_verify(const struct llapi_layout *layout, const char *path) For llapi_file_open what I'm doing currently is { /* If the user verifed the layout we still need to ensure that the * requested file is located on the same lustre file system as the * layout. */ if (strlen(layout->llot_fsname) != 0) { /* Verify that the file path belongs to a lustre filesystem. */ rc = llapi_search_fsname(path, fsname); if (rc < 0 || (strlen(fsname) == 0)) { errno = ENOTTY; return -1; } /* Verify the file system of the user supplied path is the * same one the layout was created for . */ if (strcmp(fsname, layout->llot_fsname)) { errno = ENOTTY; return -1; } } else if (llapi_layout_verify(layout, path) < 0) return -1; So it forces a verify if it doesn't see llot_fsname. If verified we still need to make sure that the file is located on the same file system. A subdirectory could be another file system mount. So I handle my fsname paranoia. As for corruption, well that is harder to deal with. For checksum we have to write a check sum algothrim since userland libcfs is going away. Also with that small amount of data their is greater chance of checksum value collisons. Maybe we could get away with a hashtable. Again with libcfs going away that might be more challanging. Have to see if we can use just the header. Currently struct llapi_layout is opaque but I guess nothing stops the user from casting and scribbling.

Andy Nelson added a comment - 16/Jul/14 4:28 PM

I agree with Ned. I seem to recall times in my past when the name of the filesystem was obscured in some odd way, so that I could
never figure out what the actual, definitive name for the filesystem was. Things like this:

/scratch8/afn/blah

being an ailas for something like this:

/lustre/scratch8/afn/blah

and so forth, done by the systems folk, always confuse me. They will confuse others too, I am sure.

As far as the create function taking an fsname argument, I don't like that for exactly the same reason. Add to that,
also the fact that the call loses its symmetry with the normal system file create call. Search above for my comment
of 6:34 Nov 5 2013, for what I mean about symmetry stuff.

All that said, putting a "verified" flag into the opaque layout structure seems a good idea with one big flaw. Namely,
that that structure, opaque or not, is still in the hands of a user, who may be malicious, or worse, inept. The latter
adjective applies to me, quite often. There are frequently times when I pass around some pointer and end up scribbling on
its data because of some call/callee error in dereferencing or some such.

In consequence, the layout structure can get corrupted in whole or in part, and its the 'in part' portion that is
the biggest problem. What happens when you have a partially corrupted layout, which still has the 'verified'
portion of it, uncorrupted? Then llapi goes ahead with the incorrect idea that the layout is valid, while it isn't.

Splat.

Or it has to go do all the verification over again, which is what the whole "verified" flag is supposed to short circuit.

Given that, is it a better option to have the layout thingy (whatever its form), be some sort of entity that is more
like a file descriptor, that indexes into a list of layouts that are kept somewhere in kernel/library space? I know I
commented about this in previous iterations of this discussion (search above somewhere), but things went a
different way at the time.

Perhaps a solution would be to make the "verified" flag some sort of checksum of the rest of the struct and if
the checksum is invalid when given to the open/create/whatever/other/call, then revalidate with all the costs
involved?

Another issue with using a path as the argument is probably the reason James likes the fsname alternative. Namely,
the question of what the "validity" applies to. Just that path alone? Everything under it in the directory tree? For example,
the path is /lustre/afn/blah, and it applies to /lustre/afn/blah/all/the/directories/and/files/under/it? Just to /lustre/afn/blah?
The option of "just that path" has simplicity, because of the complex possibility of one layout at one level of the tree, and another
layout specified for some subset of directories under it. But then what about "just that path" applying to a file? there is
no under, and when you have such a case, perhaps the thing you want to be doing to it is to take that layout and
replicate it on some other file you create somewhere, where it would not apply due to invalid verification status.

Andy Nelson added a comment - 16/Jul/14 4:28 PM I agree with Ned. I seem to recall times in my past when the name of the filesystem was obscured in some odd way, so that I could never figure out what the actual, definitive name for the filesystem was. Things like this: /scratch8/afn/blah being an ailas for something like this: /lustre/scratch8/afn/blah and so forth, done by the systems folk, always confuse me. They will confuse others too, I am sure. As far as the create function taking an fsname argument, I don't like that for exactly the same reason. Add to that, also the fact that the call loses its symmetry with the normal system file create call. Search above for my comment of 6:34 Nov 5 2013, for what I mean about symmetry stuff. All that said, putting a "verified" flag into the opaque layout structure seems a good idea with one big flaw. Namely, that that structure, opaque or not, is still in the hands of a user, who may be malicious, or worse, inept. The latter adjective applies to me, quite often. There are frequently times when I pass around some pointer and end up scribbling on its data because of some call/callee error in dereferencing or some such. In consequence, the layout structure can get corrupted in whole or in part, and its the 'in part' portion that is the biggest problem. What happens when you have a partially corrupted layout, which still has the 'verified' portion of it, uncorrupted? Then llapi goes ahead with the incorrect idea that the layout is valid, while it isn't. Splat. Or it has to go do all the verification over again, which is what the whole "verified" flag is supposed to short circuit. Given that, is it a better option to have the layout thingy (whatever its form), be some sort of entity that is more like a file descriptor, that indexes into a list of layouts that are kept somewhere in kernel/library space? I know I commented about this in previous iterations of this discussion (search above somewhere), but things went a different way at the time. Perhaps a solution would be to make the "verified" flag some sort of checksum of the rest of the struct and if the checksum is invalid when given to the open/create/whatever/other/call, then revalidate with all the costs involved? Another issue with using a path as the argument is probably the reason James likes the fsname alternative. Namely, the question of what the "validity" applies to. Just that path alone? Everything under it in the directory tree? For example, the path is /lustre/afn/blah, and it applies to /lustre/afn/blah/all/the/directories/and/files/under/it? Just to /lustre/afn/blah? The option of "just that path" has simplicity, because of the complex possibility of one layout at one level of the tree, and another layout specified for some subset of directories under it. But then what about "just that path" applying to a file? there is no under, and when you have such a case, perhaps the thing you want to be doing to it is to take that layout and replicate it on some other file you create somewhere, where it would not apply due to invalid verification status.

Ned Bass (Inactive) added a comment - 16/Jul/14 2:50 PM

It should take a path argument. Users should not have to bother figuring out the fsname from a path. They will just want to know if the layout is valid for the directory they're writing to.

Ned Bass (Inactive) added a comment - 16/Jul/14 2:50 PM It should take a path argument. Users should not have to bother figuring out the fsname from a path. They will just want to know if the layout is valid for the directory they're writing to.

People

Assignee:: Ned Bass (Inactive)

Reporter:: Ned Bass (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 27/Aug/13 5:07 PM

Updated:: 03/Feb/15 7:14 PM

Resolved:: 03/Feb/15 7:14 PM