[LU-1606] lustre_idl.h does not compile in user-space Created: 05/Jul/12  Updated: 10/Dec/15  Resolved: 06/May/14

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.3.0, Lustre 2.4.0, Lustre 2.1.1, Lustre 2.1.3, Lustre 2.4.1, Lustre 2.5.0, Lustre 2.5.1
Fix Version/s: Lustre 2.4.0, Lustre 2.5.0

Type: Bug Priority: Critical
Reporter: Christopher Morrone Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: mq313

Issue Links:
Related
is related to LU-3597 packet-lustre.c opcodes out of sync w... Resolved
is related to LU-4999 lustre_idl.h compilation regression f... Closed
is related to LU-4961 utils and test should not depend on o... Closed
is related to LUDOC-28 improve documentation examples for li... Closed
is related to LU-5001 Backport needed to b2_4 of "LU-1606 i... Resolved
is related to LU-2259 e2fsprogs: rename liblustreapi.h with... Resolved
is related to LU-5011 lustre_idl.h again does not compile i... Closed
is related to LUDOC-63 liblustreapi examples have bad includ... Closed
is related to LU-2196 Build shared version of liblustreapi Resolved
Severity: 3
Rank (Obsolete): 4531

 Description   

lustre_idl.h is mentioned (most likely incorrectly) in several examples in the Lustre manual. I've opened a lustre doc bug on that.

But is clear now that lustre_idl.h will not compile at all in user-space on Linux due to the lack of LPU64 and LPX64 definitions:

$ gcc -o example example.c
In file included from example.c:7:
/usr/include/lustre/lustre_idl.h:98:58: error: libcfs/libcfs.h: No such file or directory
In file included from example.c:7:
/usr/include/lustre/lustre_idl.h: In function 'fid_ostid_unpack':
/usr/include/lustre/lustre_idl.h:548: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h:560: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h:573: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h:582: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h: In function 'fid_ostid_pack':
/usr/include/lustre/lustre_idl.h:584: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'ostid_seq':
/usr/include/lustre/lustre_idl.h:631: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h: In function 'fid_cpu_to_le':
/usr/include/lustre/lustre_idl.h:700: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'fid_le_to_cpu':
/usr/include/lustre/lustre_idl.h:715: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'fid_cpu_to_be':
/usr/include/lustre/lustre_idl.h:724: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'fid_be_to_cpu':
/usr/include/lustre/lustre_idl.h:739: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'lu_fid_eq':
/usr/include/lustre/lustre_idl.h:765: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h:766: error: expected ')' before 'LPX64'

Should lustre_idl.h even be installed in /usr/include/lustre? If so we need to deal with the missing definitions.

I see that darwin's lustre_usr.h has a copy of the definitions, but for linux they are only in kp30.h, which is not part of user-space.



 Comments   
Comment by Christopher Morrone [ 05/Jul/12 ]

I see that LOV_MAX_STRIPE_COUNT is currently defined in lustre_idl.h, and users probably need to know that.

Comment by Peter Jones [ 08/Jul/12 ]

Minh

Could you please look into this one?

Thanks

Peter

Comment by Richard Henwood (Inactive) [ 09/Jul/12 ]

Hi Peter,

I've chatted to Minh and Andreas and I'm going to follow up on the work I did on this topic and recorded on the related Issues.

Comment by Richard Henwood (Inactive) [ 09/Jul/12 ]

The first example I'm working on is this:

#include <sys/vfs.h>
#include <config.h>
#include <liblustre.h>


static inline int maxint(int a, int b)
{
        return a > b ? a : b;
}

static void *alloc_lum()
{
        int lmmsize = sizeof(struct lov_user_md_v3) +
                LOV_MAX_STRIPE_COUNT * sizeof(struct lov_user_ost_data_v1);
        return malloc(lmmsize);
}

int main(int argc, char** argv)
{
        struct lov_user_md *lum_file = NULL;
        int rc;
        int lum_size;

        if (argc != 2) {
                fprintf(stderr, "Usage: %s <filename>\n", argv[0]);
                return 1;
        }

        lum_file = alloc_lum();
        if (lum_file == NULL) {
                rc = ENOMEM;
                goto cleanup;
        }

        rc = llapi_file_get_stripe(argv[1], lum_file);
        if (rc) {
                rc = errno;
                goto cleanup;
        }

        /* stripe_size stripe_count */
        printf("stripe_size=%d, stripe_count=%d\n",
               lum_file->lmm_stripe_size,
               lum_file->lmm_stripe_count);

cleanup:
        if (lum_file != NULL)
                free(lum_file);

        return rc;
}

The preconditions to compiling this are:

1. RPMs need to be installed

lustre-client-2.1.2-2.6.32_220.17.1.el6.x86_64.x86_64
lustre-client-modules-2.1.2-2.6.32_220.17.1.el6.x86_64.x86_64
lustre-client-source-2.1.2-2.6.32_220.17.1.el6.x86_64.x86_64

This seems reasonable.

2. config.h

This is created by /usr/src/lustre-2.1.2/configure. This is not included in the client rpms, so I have to generate it. In order to generate it, I need kernel-devel installed.

Compiling command.

Once I've got config.h I can compile with:

gcc -o ex1 ex1.c -I/usr/src/lustre-2.1.2/{lustre,lnet,libcfs}/include -I/usr/src/lustre-2.1.2/ -llustreapi

Brian, I feel you can cast judgement. Can and should config.h be included in the built RPMs to avoid the additional steps to generate it on the client?

Comment by Christopher Morrone [ 09/Jul/12 ]

I'm sorry, this is not the way to go about making examples for user-space applications.

Here's what you have to work with in user-space:

$ rpm -ql lustre |grep -e "include.*h"
/usr/include/libcfs/posix/posix-types.h
/usr/include/linux/lustre_user.h
/usr/include/lustre/libiam.h
/usr/include/lustre/liblustreapi.h
/usr/include/lustre/ll_fiemap.h
/usr/include/lustre/lustre_idl.h
/usr/include/lustre/lustre_user.h

You should definitely not have either config.h or liblustre.h in your example.

Frankly, our headers are named quite badly. I would have preferred that we have a single "lustre/lustre.h" header for users, or something like that. But I we're stuck with it for now, I suppose.

Of the listed files, I would argue that only one or both of the following should be used for most examples:

/usr/include/lustre/lustre_user.h
/usr/include/lustre/liblustreapi.h

Some examples will require additional headers. For instance, for your example we would need

/usr/include/lustre/lustre_idl.h

in order to get the definition of LOV_MAX_STRIPE_COUNT.

Further, the "alloc_lum" is not a great example because it it is version-dependent. I just went through this with one of our users, and created the following function that might work better:

struct lov_user_md *alloc_lov_user_md(int stripe_count)
{
    size_t size;
    struct lov_user_md *ptr;

    size = sizeof(struct lov_user_md)
           + (sizeof(*(ptr->lmm_objects)) * stripe_count);
    ptr = (struct lov_user_md *)malloc(size);

    return ptr;
}

I'm not a fan of the variable name here:

struct lov_user_md *lum_file = NULL;

If you made it just "lum", instead of "lum_file", I would be fine with it.

Your "maxint" function isn't used, so should be removed.

Also, the compilation command is too long. The header files that you actually need are already in the standard location. But fixing that will naturally follow from using the correct headers, instead of trying to pull files out of the lustre source tree.

This would be easier to review if it was in gerrit.

Finally, this discussion is probably in the wrong jira ticket, since this ticket is about lustre_idl.h failing to compile in user-space.

Comment by Richard Henwood (Inactive) [ 09/Jul/12 ]

Thanks for your rapid and detailed response Chris. I'll work through your suggestions.

Comment by Richard Henwood (Inactive) [ 13/Jul/12 ]

Hi Chris;

In the absence of a lustre-client-devel.rpm, I have found it necessary to install lustre-client-source.rpm to build on the client. lustre-client-source.rpm provides a load of useful headers into /usr/src/lustre-2.1.2/.

I'm performing the following tests:

$ gcc -c ./lustre_idl.h -I/usr/src/lustre-2.1.2/{libcfs,lnet}/include/
In file included from /usr/src/lustre-2.1.2/libcfs/include/libcfs/libcfs_private.h:46,
                 from /usr/src/lustre-2.1.2/libcfs/include/libcfs/libcfs.h:308,
                 from ./lustre_idl.h:98:
/usr/src/lustre-2.1.2/lnet/include/lnet/types.h:282:3: error: #error "LNET_MAX_PAYLOAD must be defined in config.h"
$

With the headers provided by lustre-client-source.rpm, this compiles through to needing LNET_MAX_PAYLOAD - which is defined in /usr/src/lustre-2.1.2/config.h.

$ gcc -c ./lustre_user.h -I/usr/src/lustre-2.1.2/{libcfs,lnet}/include/
In file included from ./lustre_user.h:52:
/usr/include/lustre/ll_fiemap.h:111: error: expected ‘=’, ‘,’, ‘;’, ‘asm’ or ‘__attribute__’ before ‘fiemap_count_to_size’
/usr/include/lustre/ll_fiemap.h:117: error: expected ‘)’ before ‘array_size’
./lustre_user.h: In function ‘hai_dump_data_field’:
./lustre_user.h:820: warning: incompatible implicit declaration of built-in function ‘snprintf’
$

I believe the problem here is that the include of stddef.h to define size_t does not happen. Including <libcfs/libcfs.h> before the include of lustre/ll_fiemap.h resolves the error - and brings us to the LNET_MAX_PAYLOAD definition error.

Comment by Christopher Morrone [ 13/Jul/12 ]

If you are trying to emulate what a normal user would do, you should not be poking into lustre's internals. You don't need a "lustre-client-devel" package, because the headers and libraries that would appear in something like that are already in "lustre-client", as I mentioned in this comment.

Here's how I would test this:

hype356:~$ cat idl_example.c
#include <lustre/liblustreapi.h>
#include <lustre/lustre_idl.h>

main() {}
hype356:~$ gcc idl_example.c -llustreapi
In file included from /usr/include/lustre/lustre_user.h:52,
                 from /usr/include/lustre/liblustreapi.h:48,
                 from idl_example.c:1:
/usr/include/lustre/ll_fiemap.h:111: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'fiemap_count_to_size'
/usr/include/lustre/ll_fiemap.h:117: error: expected ')' before 'array_size'
In file included from idl_example.c:2:
/usr/include/lustre/lustre_idl.h:98:58: error: libcfs/libcfs.h: No such file or directory
In file included from idl_example.c:2:
/usr/include/lustre/lustre_idl.h: In function 'fid_ostid_unpack':
/usr/include/lustre/lustre_idl.h:548: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h:550: error: 'EBADF' undeclared (first use in this function)
/usr/include/lustre/lustre_idl.h:550: error: (Each undeclared identifier is reported only once
/usr/include/lustre/lustre_idl.h:550: error: for each function it appears in.)
/usr/include/lustre/lustre_idl.h:560: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h:573: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h:582: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h: In function 'fid_ostid_pack':
/usr/include/lustre/lustre_idl.h:559: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h:616: error: 'EBADF' undeclared (first use in this function)
/usr/include/lustre/lustre_idl.h: In function 'ostid_seq':
/usr/include/lustre/lustre_idl.h:631: error: expected ')' before 'LPU64'
/usr/include/lustre/lustre_idl.h: In function 'fid_cpu_to_le':
/usr/include/lustre/lustre_idl.h:700: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'fid_le_to_cpu':
/usr/include/lustre/lustre_idl.h:715: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'fid_cpu_to_be':
/usr/include/lustre/lustre_idl.h:724: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'fid_be_to_cpu':
/usr/include/lustre/lustre_idl.h:739: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h: In function 'lu_fid_eq':
/usr/include/lustre/lustre_idl.h:765: error: expected ')' before 'LPX64'
/usr/include/lustre/lustre_idl.h:766: error: expected ')' before 'LPX64'

Now the solution here is not to start pulling more and more header files out of lustre. We need to selectively move definitions and such around until the existing set of user-space headers compile.

So yes, I think that ll_fiemap.h is missing an include of stddef.h when in user space. But we should add THAT include, not start including libcfs headers.

And now back to lustre_idl.h. This header is really ill suited to being used by users that just want to use use the lustre standard API. Perhaps we really just need to move the few bits that users will really need (like LOV_MAX_STRIPE_COUNT) into lustre_user.h, and then drop lustre_idl.h from user space.

Comment by Christopher Morrone [ 17/Jul/12 ]

It seems clear from some comments in lustre_idl.h itself

* ALL structs passing over the wire should be declared here.  Structs
* that are used in interfaces with userspace should go in lustre_user.h.
/* Defn's shared with user-space. */
#include <lustre/lustre_user.h>

and just by reading the file that folks have not intended for lustre_idl.h to be used in user space. And yet using certain APIs in user space currently require the inclusion of lustre_idl.h, so it is installed in the standard header location by default.

Are there a relatively small set of things that we can move from lustre_idl.h into lustre_user.h and cease installing lustre_idl.h?

We obviously need LOV_MAX_STRIPE_COUNT, because llapi_file_get_stripe() just assumes that the array length will be any size that it needs. That in turn requires the user to always make the length of the array LOV_MAX_STRIPE_COUNT.

What else would users need?

And while we are fixing things, frankly I think we should change the name of the "liblustreapi.h" to "lustreapi.h". "liblustre" is name for the lustre client-in-user-space code. So "liblustreapi" has the unfortunate implication of somehow being an API for that client-in-user-space code. But its just a general API for getting/setting lustre information.

We can keep "liblustreapi.h" around for a time for backwards compatibility, but make it nearly empty. Its only contents would be a comment explaining that it is deprecated, and "#include <lustreapi.h>".

Comment by Christopher Morrone [ 17/Jul/12 ]

I believe that by including libcfs/libcfs.h we are now violating this rule at the top of lustre_idl.h:

 * The only other acceptable items in this file are VERY SIMPLE accessor
 * functions to avoid callers grubbing inside the structures, and the
 * prototypes of the swabber functions for each struct.  Nothing that
 * depends on external functions or definitions should be in here.

Note especially that last sentence.

But Ricardo, understandably, added the include of libcfs/libcfs.h in f1eaa63c32d7af5a09b1d9874b0310bad02b1a1f because "lustre_idl.h uses symbols such as LASSERT, CLASSERT, LPUX64, etc". Which is exactly why it doesn't compile in userspace either.

So it seems like we went the wrong way by adding that include. We should have backed out the user of external definitions that are violating the rules of lustre_idl.h. I see that Andreas made a couple of tweaks here in 5351a1ba03e3464f26271216cd266d2cb77b5625, but its not clear to me how he expected LASSERT and LPU64 to be defined in user space normally.

We can move LOV_MAX_STRIPE_COUNT and a few nearby defines into lustre_user.h. Then we can drop lustre_idl.h from all user space examples in the manual, and see if there is anything else we need to move to lustre_user.h. And finally, we might just not install lustre_idl.h (or install an empty stub for backwards compatibility).

Alternatively, if we really want lustre_idl.h in user space we should back out all of the external definitions that have crept into lustre_idl.h. Then we should just include lustre_idl.h from lustreapi.h so that the users only need a single include to use the API.

I'm not wed to any approach yet. But we definitely need to untangle the user space headers.

Comment by Christopher Morrone [ 17/Jul/12 ]

Here are some patches to get us started. These are completely untested as of yet.

http://review.whamcloud.com/3425
http://review.whamcloud.com/3426
http://review.whamcloud.com/3427

Comment by Richard Henwood (Inactive) [ 01/Aug/12 ]

Two of these patches are landed (3425 and 3426.)

Chris has pointed out that landing of the remaining patch 3427 will invalidate some of the code examples contained in the man pages and manual - so these need to be updated too.

Some of the man changes and examples are updated in the patch but these haven't been extensively tested according to Chris. I'm going to review 3427 this patch and identify the impact of the header changes on code examples, man pages, the Manual and tests required to get this landed.

Comment by Richard Henwood (Inactive) [ 03/Aug/12 ]

I propose to re-factor the change 3427 and provide a basic api fix change with two associated changes for the c test code and man pages changes.

The manual update is being tracked in LUDOC-28 this will be completed after the code changes are landed.

thoughts?

Comment by Christopher Morrone [ 03/Aug/12 ]

What api fix are you proposing? Why do it in 3427 instead of just adding another commit that builds on 3427? I like to keep commits simple and clear when possible. How about we just let 3427 be the header rename, and do further changes in the next commit?

Comment by Richard Henwood (Inactive) [ 06/Aug/12 ]

I think you are suggesting moving the test cases and man updates into a dependent commit. This would leave 3427 as basic rename/refactor.

I think this is a fine idea and I'll begin work on it.

Comment by Richard Henwood (Inactive) [ 14/Aug/12 ]

I am making progress: I have completed a minor refactoring of 3427 to make 3427 just the implementation of lusterapi.h. I am preparing two dependent change-sets: one with the man page updates, another with the updates to the code in ./tests/ that needs updating. There is no new work yet - just Chris's patches and some testing.

I am currently looking at ./lustre/tests/ll_dirstripe_verify.c which calls lov_mds_md_size. lov_mds_md_size is not exposed by liblusterapi.c.

I am currently looking into the best way to satisfy the general rule: you only need lustreapi.h and get the lov_mds_md_size functionality.

Comment by Christopher Morrone [ 15/Aug/12 ]

There is no new work yet - just Chris's patches and some testing.

That isn't quite accurate. An unrelated change to lustre_user.h was added.

I don't think the goal here is to make everything in lustre/tests use only the api available to end users (although that might be a reasonable thing to pursue next. The goal is just to make the examples that we provide users use only the api, and make the api's headers actually compilable in user space.

Comment by Richard Henwood (Inactive) [ 09/Oct/12 ]

The API fixes have landed: http://review.whamcloud.com/#change,3427

To reduce the changes of this breaking, unnoticed, again it would be good to include an item in the test suite. As a start, I present the following proposals:

  1. Pull and build the code examples embedded in the man page.
  2. Pull and build the code examples embedded in the Manual.
  3. Build a separate, stand-alone example.

Any preference?

Comment by Richard Henwood (Inactive) [ 12/Oct/12 ]

lustre-client-source.rpm doesn't provide /usr/src/lustre/config.h. config.h provides LNET_MAX_PAYLOAD that is necessary for building.

I will pursue getting /usr/src/lustre/config.h included the lustre-client-source.rpm.

Comment by Christopher Morrone [ 12/Oct/12 ]

Hold on for a moment; that needs more explanation. I am concerned that you are trying to use lustre-client-source.rpm for anything in userspace again. I tried to explain already that nothing in userspace should be using lustre's internal source or headers.

So lets start with this: what is it that you are building that needs LNET_MAX_PAYLOAD?

Comment by Christopher Morrone [ 12/Oct/12 ]

Also, are you aware that lustre-source*.rpm and lustre-client-source*.rpm are essentially the same thing? the "lustre-client" version is just a result of building with servers support disabled.

The more I think about it, the more and I am unclear on why we have "binary" -source rpms. Perhaps it is time to eliminate those. It seems like people should just use the real source rpm if they want the source code.

In proper Lustre tradition, there are no comments that explain the existence of the "source" binary package...

So Richard, yeah, really don't do that. Those packages may not even exist when Lustre 2.4 comes out.

Comment by Richard Henwood (Inactive) [ 15/Oct/12 ]

Chris: I agree, the lustre-source* rpm does sound redundant to me.

Also: I gave my initial suggestion to include /usr/src/lustre/config.h thought over the weekend. As you say, this is the wrong way to go because it adds dependency between the client and the build configuration and internals. I agree this is an undesirable approach.

LNET_MAX_PAYLOAD is required by /usr/include/lnet/types.h:

$ gcc basic.c -llustreapi
In file included from /usr/include/libcfs/libcfs_private.h:46,
                 from /usr/include/libcfs/libcfs.h:313,
                 from /usr/include/lustre/lustre_idl.h:95,
                 from /usr/include/lustre/lustreapi.h:46,
                 from basic.c:1:
/usr/include/lnet/types.h:279:3: error: #error "LNET_MAX_PAYLOAD must be defined in config.h"

basic.c looks like

#include <lustre/lustreapi.h>

main() {}

I've been searching to see how the problem with LNET_MAX_PAYLOAD has been addressed in the past and found a thread and patch from last year on the topic of a lustre-devel pachage

I will review this patch, and identify useful components that can be applied to this issue.

Comment by Christopher Morrone [ 15/Oct/12 ]

Ok, the first problem that needs to be addressed is lustre_idl.h including libcfs.h. That should not be happening. libcfs/libcfs.h is not packaged for user-space. The only header that from libcfs that we have allowed to appear in the package for user-space is libcfs/posix/posix-types.h, and we don't want to allow any more without very good reason.

I'm getting deja vu...right, see my July 17 comment where I pointed out this problem. In particular this sentence:

I see that Andreas made a couple of tweaks here in 5351a1ba03e3464f26271216cd266d2cb77b5625, but its not clear to me how he expected LASSERT and LPU64 to be defined in user space normally.

In summary, in user-space lustre_idl.h should not be including libcfs.h.

In my earlier July 13 comment, I also suggested that it might be best to remove lustre_idl.h from user-space altogether. Things that need to be shared with user-space should be moved from lustre_idl.h to lustre_user.h, or something along those lines.

Comment by Richard Henwood (Inactive) [ 18/Oct/12 ]

I like the idea of removing lustre_idl.h from user-space.

In pursuing this goal, I've been removing various tentacles that extend from a bunch of the ./lustre/tests/*.c files into the source. I've been chipping away at these.

Can you advise on: LPU64?

Should these definitions be included in lustreapi.h or just pop-in a llu definition?

An example LPU64 usage can be found in: ./lustre/tests/copytool.c

Comment by Christopher Morrone [ 18/Oct/12 ]

Ah, good! Thats the question I posed in the "Description" section of this jira ticket. The lack of LPU64 and LPX64 definitions is a problem. What I'd love to do is rewrite the API functions to abstract away our wire-protocol fixed-size types, and use more standard posix types wherever possible. Using a data type that is larger than necessary isn't a problem when we aren't putting the value in a packet on the wire.

But, of course, that is way too much work for the scope of this ticket.

So first, lets note that there are three "lustre_user.h" files in the source tree:

./lustre/include/lustre/lustre_user.h
./lustre/include/linux/lustre_user.h
./lustre/include/darwin/lustre_user.h

The "lustre/lustre_user.h" file is the main one that users will #include. It in turn #includes either linux/lustre_user.h or darwin/lustre_user.h based on which OS you are using.

Note that the darwin version of lustre_user.h defines LPU64 & friends, but the linux version does not.

Further note that "LPU64" is defined in no less than six places at the moment:

libcfs/include/libcfs/darwin/kp30.h
libcfs/include/libcfs/linux/kp30.h
libcfs/include/libcfs/posix/posix-wordsize.h
libcfs/include/libcfs/winnt/kp30.h
lustre/include/darwin/lustre_lib.h
lustre/include/darwin/lustre_user.h

About the only place they aren't currently defined is Linux user-space.

I should also note that we can't assume that both user-space and kernel-space have the same data types on the SAME system. I believe that Linux on ppc is a place were user space and kernel space can differ.

It looks like the libcfs/posix/posix-wordsize.h has the most exhaustive set of definitions of LPU64 & friends, attempting to cover kernel- and user-space, and 32-bit and 64-bit systems.

At first glance, I would be tempted to add posix-wordsize.h to the list of normally packaged user-space headers. It seems to have been originally intended for that purpose, based on the comment at the top "Wordsize related defines for posix userspace."

But then posix-wordsize.h has some odd things in it besides just dealing with definitions to deal with wordsize. It also has things like this that seem totally out of place:

# define CFS_MODULE_PARM(name, t, type, perm, desc)
#define PORTAL_SYMBOL_GET(x) inter_module_get(#x)
#define PORTAL_SYMBOL_PUT(x) inter_module_put(#x)

But I don't suppose that would actually hurt anything if we put this file in user-space. I think maybe that came over as a side effect of posix-wordsize.h starting out as a cut-and-paste of the linux/kp30.h file (as seen by the "#ifndef _LIBCFS_LINUX_KP30_H" at the top of posix-wordsize.h). I don't know if there are actually any consumers of CFS_MODULE_PARM, or PORTAL_SYMBOL* from posix-wordsize.h.

The libcfs directory has become pretty messy about keeping kernel-space and user-space separated.

The more that I look at it, the more that I think that exposing posix-wordsize.h to user-space is the way to go for now. Anything more could kick of rather larger of a cleanup effort than we're ready for at the moment.

And while I propose to include posix-wordsize.h in user-space, I don't intend that any users should ever use it directly. It will just be an implementation detail of lustreapi.h as far as real users are concerned. So we should not be hamstrung by its inclusion should someone have time for a larger cleanup effort in the future.

Then that might just leave the issue of LASSERT & variants not being defined in user space.

Comment by Christopher Morrone [ 18/Oct/12 ]

What do folks think of the following two patches?

http://review.whamcloud.com/4301
http://review.whamcloud.com/4302

Those are completely untested.

Comment by Jian Yu [ 09/Nov/12 ]

This is blocking building e2fsprogs package (LU-2259).

Comment by Christopher Morrone [ 09/Nov/12 ]

I abandonded the 4301 and 4302 changes.

Instead, I think that we should remove the #include of lustre_idl.h from lustreapi.h. Please review change 4505.

Comment by Richard Henwood (Inactive) [ 12/Nov/12 ]

From your commit log:

lustre_idl.h has become increasingly difficult to use from
user-space. Normal users of the lustre api should not
be looking into lustre wire protocol anyway, so this change
eliminates the include of lustre_idl.h.

After removing lustre_idl.h, it became obvious that a number
of programs have been picking up normal user-space headers
through a very windy path of includes. With the include
of lustre_idl.h gone, they no longer compiled, so we also
add the missing explicit includes.

It became clear that copytool.c explicity requires
libcfs/libcfs.h, and lustre_rsync.c require lustre_idl.h.
But I believe that it is far better to have those includes
explicitly stated, so it is obvious that those programs
are peeking into lustre's internals. In the future we
can work on creating new lustre API calls that provide the
information that they need without side-stepping abstraction
layers.

I was playing with the now abandoned changes 4301, 4302 and was getting bogged down resolving dependencies as you say in your commit (above.)

I like the approach of the new patch. With your new approach, the task now is to enhance lustreapi.h so no userspace code needs to fish around in the internals. There is a open ticket for lfs.c to only use lustreapi.h: http://jira.whamcloud.com/browse/LU-1758

if this current change lands, I will open tickets for lustre/tests/copytool.c, lustre/tests/multiop.c, lustre/tests/sendfile.c, and lustre/utils/lustre_rsync.c.

Comment by Christopher Morrone [ 19/Nov/12 ]

Oh, I see, change 3427 really did introduce a new bug. Change number 4505 corrects that bug.

What we missed in the reviews is that someone (who shall remain nameless ) snuck an extra "#include <lustre/lustre_idl.h>" into the new lustreapi.h file.

This is a great learning case though. I think what happened is that the rename of liblustreapi.h to lustreapi.h made the entire contents of the lustreapi.h appear as new lines in a diff. That means that any new lines added at the same time as the file rename are really easy for the reviewers to miss, which I think is what happened here.

So maybe I wasn't clear on why I wasn't in favor of expanding my patch with additional changes, but this is generally the type of thing that I was worried about. And clearly even though I was worried about it, I too failed to notice the new lustre_idl.h include. It just got lost in the sea of added lines, and I assumed that it was still just the file rename.

Comment by Prakash Surya (Inactive) [ 20/Nov/12 ]

This is a great learning case though. I think what happened is that the rename of liblustreapi.h to lustreapi.h made the entire contents of the lustreapi.h appear as new lines in a diff. That means that any new lines added at the same time as the file rename are really easy for the reviewers to miss, which I think is what happened here.
So maybe I wasn't clear on why I wasn't in favor of expanding my patch with additional changes, but this is generally the type of thing that I was worried about. And clearly even though I was worried about it, I too failed to notice the new lustre_idl.h include. It just got lost in the sea of added lines, and I assumed that it was still just the file rename.

Yes, in general this is how it works. The new file and all its lines just show as "added" lines, and the removed file as "removed" lines. Although, if the committer uses the git mv command, Gerritt (and git status, but not git log) will show an annotation on the renamed file to reflect that it hasn't changed ans was only renamed. For example, see http://review.whamcloud.com/4633.

Comment by Prakash Surya (Inactive) [ 20/Nov/12 ]

And then if the git mv'd file is modified, it correctly shows the diff for the modified lines. See: http://review.whamcloud.com/4634. In the future, I suggest negative reviews for patches which rename a file and do not use this method to try and prevent these mistakes.

Comment by Christopher Morrone [ 20/Nov/12 ]

Well, actually, if I understand correctly there is nothing particularly special about "git mv"; it is just a convenience function. Git is a lot like a file system internally. At each commit, it represents the full contents of the tree. Then each commit has one or more "parents". So there is nothing internally that records "this file moved to from here in this commit to there in that commit". It is up to commands like "git status" and other things to figure that out after the fact.

So you can move the file without "git mv", then individually do the "git rm" and "git add" commands, and "git status" will show "renamed" just as reliably as if "git mv" had been used. Except when sometimes it doesn't...and I'm not entire sure what situations fool it and other tools.

Comment by Peter Jones [ 07/Dec/12 ]

With the latest landing is this issue now fixed for master?

Comment by Christopher Morrone [ 07/Dec/12 ]

We worked around one problem and are on the road to having better examples, but the titular problem still remains: lustre_idl.h is packaged and installed in a user visible location, and yet does not compile in user space.

More and more, I am leaning towards the solution of just not packaging lustre_idl.h.

Comment by Richard Henwood (Inactive) [ 13/Dec/12 ]

Chris: given that http://review.whamcloud.com/4505 has landed, I believe users can compile code on 2.4.

Is the remaining task to remove lustre_idl.h and see what breaks, and then fix that?

Comment by Christopher Morrone [ 14/Dec/12 ]

Sure, if folks are ok with us removing lustre_idl.h. We might, though, want to consider replacing it with an empty header with a warning (like liblustreapi.h) so that we don't flat out break apps that included it for no other reason than the examples said to include it.

Comment by Ned Bass [ 01/Feb/13 ]

I believe we also need to include stdio.h from lustre_user.h since it uses snprintf.

Comment by Andreas Dilger [ 12/Mar/13 ]

http://review.whamcloud.com/5682 removes LASSERT() and CLASSERT() from lustre_idl.h.

Comment by Richard Henwood (Inactive) [ 01/May/13 ]

A regression test has now landed:

http://review.whamcloud.com/3440

Comment by Richard Henwood (Inactive) [ 29/Aug/13 ]

I would like to close this ticket: any objections?

Comment by Christopher Morrone [ 29/Aug/13 ]

None here. Thanks!

Comment by Jodi Levi (Inactive) [ 15/Oct/13 ]

Added 2.5.0 FixVersion

Comment by Robert Read (Inactive) [ 29/Apr/14 ]

I'm seeing similar errors again in master and 2.5.1. Also, some additional include files needed for lustre_idl.h are not being installed.

This is on master, and 2.5.1 is similar:

$ gcc example.c
In file included from example.c:1:
/usr/include/lustre/lustre_idl.h:95:49: error: libcfs/libcfs.h: No such file or directory
/usr/include/lustre/lustre_idl.h:101:33: error: lustre/lustre_errno.h: No such file or directory
In file included from example.c:1:
/usr/include/lustre/lustre_idl.h: In function ‘ostid_set_id’:
/usr/include/lustre/lustre_idl.h:705: error: expected ‘)’ before ‘LPU64’
/usr/include/lustre/lustre_idl.h:712: error: expected ‘)’ before ‘LPU64’
/usr/include/lustre/lustre_idl.h:722: error: expected ‘)’ before ‘LPU64’
/usr/include/lustre/lustre_idl.h: In function ‘fid_set_id’:
/usr/include/lustre/lustre_idl.h:723: error: expected ‘)’ before ‘LPX64’
/usr/include/lustre/lustre_idl.h:734: error: ‘EBADF’ undeclared (first use in this function)
/usr/include/lustre/lustre_idl.h:734: error: (Each undeclared identifier is reported only once
/usr/include/lustre/lustre_idl.h:734: error: for each function it appears in.)
/usr/include/lustre/lustre_idl.h:739: error: expected ‘)’ before ‘LPU64’
/usr/include/lustre/lustre_idl.h:748: error: expected ‘)’ before ‘LPU64’
/usr/include/lustre/lustre_idl.h: In function ‘ostid_to_fid’:
/usr/include/lustre/lustre_idl.h:749: error: expected ‘)’ before ‘LPX64’
/usr/include/lustre/lustre_idl.h:774: error: ‘EBADF’ undeclared (first use in this function)
/usr/include/lustre/lustre_idl.h:785: error: expected ‘)’ before ‘LPX64’
/usr/include/lustre/lustre_idl.h:787: error: expected ‘)’ before ‘LPX64’
/usr/include/lustre/lustre_idl.h: In function ‘fid_to_ostid’:
/usr/include/lustre/lustre_idl.h:803: error: expected ‘)’ before ‘LPX64’
/usr/include/lustre/lustre_idl.h:817: error: ‘EBADF’ undeclared (first use in this function)
/usr/include/lustre/lustre_idl.h: In function ‘lmv_mds_md_size’:
/usr/include/lustre/lustre_idl.h:2808: error: ‘EINVAL’ undeclared (first use in this function)
/usr/include/lustre/lustre_idl.h: In function ‘lmv_mds_md_stripe_count_get’:
/usr/include/lustre/lustre_idl.h:2821: error: ‘EINVAL’ undeclared (first use in this function)
/usr/include/lustre/lustre_idl.h: In function ‘lmv_mds_md_stripe_count_set’:
/usr/include/lustre/lustre_idl.h:2837: error: ‘EINVAL’ undeclared (first use in this function)
/usr/include/lustre/lustre_idl.h: At top level:
/usr/include/lustre/lustre_idl.h:3127: error: expected specifier-qualifier-list before ‘lnet_nid_t’
Comment by Christopher Morrone [ 30/Apr/14 ]

My earlier comment apparently still applies.

Robert, do you really need to compile against lustre_idl.h? I would probably still argue that lustre should not be installing lustre_idl.h in user space. The intent was clearly to use lustre/lustre_user.h as the place for definitions that needed to be shared with user space. lustre_idl.h really never should have be installed in the first place.

Comment by Robert Read (Inactive) [ 30/Apr/14 ]

Currently liblustreapi code is including and using defines in lustre_idl.h, so reproducing some that functionality is difficult without access to the same headers. Even using liblustreapi outside of the lustre tree can be challenging since there are a few definitions in lustre_user.h that depend on lustre_idl.h. Since it seems removing luste_idl.h from userspace is the correct thing to do, then we should fix all of our userspace tools to not rely on it. As it is today, though, we're shipping broken header files, so let's fix that here, either way.

Comment by Cory Spitz [ 30/Apr/14 ]

I agree that we shouldn't ship broken headers, but we should really agree what should be installed. There should be a hello world program(s) added to the test framework that include(s) stuff that should be installed.

Comment by Robert Read (Inactive) [ 30/Apr/14 ]

Such a test was added by Richard for this ticket (http://review.whamcloud.com/3440. It's using the includes in the tree, though, so it's not actually testing the exported API. And because it's using a for loop it silently does nothing when the test programs are not available, so it looks like it's not testing anything when run in a test environment.

This is a drawback of having the userspace code in the same tree as the kernel code. It's easy for these problems to creep in without the isolation and forced discipline of keeping the code separate trees.

Comment by Christopher Morrone [ 30/Apr/14 ]

I'm not seeing any explicit include of lustre_idl.h in lustreapi.h's tree of includes. Am I missing anything?
If that is correctly avoiding the use of lustre_idl.h then we are better off then I feared.

I completely agree that much of the things that the liblustreapi library can do internally (because it is built in-tree) cannot be done by external applications. And I know that the there is legacy crap in lustreapi.h that is very poorly implemented.

I would just argue that the path forward is to start developing good APIs to access the needed functionality rather than trying to make lustre_idl.h usable in user-space.

Comment by Robert Read (Inactive) [ 01/May/14 ]

We're in violent agreement on the goal here - a good API.

I didn't mean that lustreapi.h includes lustre_idl.h directly. I meant some things defined in lustreapi.h (or lustre_user.h) require definitions that typically done via lustre_idl.h. For example, DFID uses LPX64, and that is not defined in lustre_user.h or in any of the installed headers. Since the in-tree user space utilities are currently relying on lustre_idl.h, I, rather naively, thought the right way forward was to enable external utilities to use the same headers the internal tools used, because I would like the same API to be available to all user space utilities. Otherwise, if I want to use those headers I need a configured lustre tree available and have to add the following to my gcc command line:

      -include /home/rread/lustre-release/config.h \
      -I/home/rread/lustre-release/libcfs/include \
      -I/home/rread/lustre-release/lustre/include \
      -I/home/rread/lustre-release/lnet/include

I'd like to not have to do this - what is the best way forward?

Comment by John Hammond [ 01/May/14 ]

Here is my 42 point proposal:

  1. Add libcfs/include/libcfs/libcfs_types.h:
    #ifdef __KERNEL__
    # include <linux/types.h>
    #else /* __KERNEL__ */
    # include <stdint.h>
    typedef uint64_t __u64;
    /* ... */
    #endif
    
  2. Add libcfs/include/libcfs/libcfs_types.h to the set of shipped headers.
  3. Kill HAVE_USER__U64_LONG_LONG and similar autocrud.
  4. Rewrite DFID() to use '%#llx',... and PFID() to use explicit casts.
  5. Similarly for the other D*() and P*() macros.
  6. Move SFID and RFID to lustre_idl.h so no one uses them by mistake.
  7. Include libcfs_types.h in lustre_user.h.
  8. Don't ship lustre_idl.h.
  9. If you want find something in lustre_idl.h that you believe belongs in lustre_user.h then move it.
  10. Add a checkpatch rule that LPX64 and similar are not being added to our shipped headers.
  11. Add a real compile lustre_user.h in userspace test.

We must not ship config.h and we must not ship any headers that depend on config.h. Doing so is just braindead.

Comment by Robert Read (Inactive) [ 01/May/14 ]

I'm also using SFID/RFID, and fid_is_sane() which is in lustre_idl.h.

Comment by John Hammond [ 01/May/14 ]

On my 64-bit machine uint64_t is unsigned long, while the kernel's __u64 is unsigned long long. So my proposal will create typedef conflicts for files that see those definitions and include a linux/ header. Bah!

Comment by Christopher Morrone [ 01/May/14 ]

I, rather naively, thought the right way forward was to enable external utilities to use the same headers the internal tools used, because I would like the same API to be available to all user space utilities.

Yeah, I can understand that. The problem, of course, is that those interfaces are often hackerish and designed to meet only the immediate need of the in-tree built lustre tools. Far too often the "library" really contains code specific to a specific tool rather than being something that is in any way reasonably reusable. They were not at all designed with the needs of user space, or general concepts of good abstraction in mind.

Things like DFID and PFID do not, in my opinion, belong in the userspace API. We also do not want inline functions in headers (e.g. fid_is_sane) in the user space interfaces.

For DFID/PFID, we instead want fid-to-string and string-to-fid style of functions in the user space library.

The problem with macros and inline functions is that the application needs to be recompiled any time there is a change. A library call on the other hand, can often change internally without changing the external interface and therefore require no rebuild of the external application. If the interface needs to change externally, it is more obvious that we need to increment the shared library version. I am certain that folks will forget to do that with macros and inlines. (Of course, we don't version our library yet, which is also criminal...)

Comment by Robert Read (Inactive) [ 02/May/14 ]

Chris, yes, I'm buying in to separation lustre_user.h vs. lustre_idl.h.

I would like to take a step back, then, and consider the ioctl interface for Lustre to be the fundamental userspace API. The existing library is can be a reference implementation, but it should be optional. This interface should ideally consist of constants, types, and structure definitions, with perhaps some useful macros along the lines of S_ISDIR(), but no functions at all, inline or otherwise. We don't need things like DFID/SFID because representing data externally is the application's business. Likewise, the kernel should be responsible for ensuring the sanity of its inputs, so we don't necessarily need fid_is_sane() in userspace.

Perhaps a good model for us is fiemap ioctl, which has rather complex interface for an ioctl, but still doesn't require a library to use.

Comment by Christopher Morrone [ 03/May/14 ]

The lustre ioctls are an interface. What in particular are you trying to do?

Comment by Christopher Morrone [ 03/May/14 ]

I am marking this a blocker for 2.6. We can't ship with broken headers.

And this time around we need to really figure out what to do with lustre_idl.h. The developers and code reviewers keep allowing it to break over and over again. Status quo is clearly not working.

Comment by Robert Read (Inactive) [ 03/May/14 ]

Moving to triage for assignment.

Comment by Christopher Morrone [ 03/May/14 ]

I did some leg work to track down which commits were at fault and opened four separate bugs on them. They are marked as blocking this ticket. We clearly need to raise awareness about the rules about what is allowed in lustre_idl.h.

Or stop packaging lustre_idl.h, of course, in which case we don't need much in the way of rule. But not packaging it requires additional work to add new APIs to export required information. For instance, there is LOV_MAX_STRIPE_COUNT and likely other things that users need to be able to allocate buffers correctly and work with striping. The striping issues can be simplified if we get the work from LU-2182 landed.

We probably don't have a very exact list of things that external applications have been using from lustre_idl.h.

Comment by Robert Read (Inactive) [ 03/May/14 ]

LOV_PATTERN_F_RELEASED is needed to import stub files for HSM.

Comment by Prakash Surya (Inactive) [ 03/May/14 ]

I would like to take a step back, then, and consider the ioctl interface for Lustre to be the fundamental userspace API.

Yes! Without userspace daemons for applications to talk to, this is true no matter what; whether we like it or not. Since Lustre only has kernel components, the system call interface is the userspace API. Libraries are nice, but they are essentially only "helper functions" built on top of the core API which is the system calls and ioctl protocol. I'm really glad to see somebody upstream make this distinction.

Comment by Andreas Dilger [ 04/May/14 ]

I for one do NOT want the ioctl interface (at least not the current one) to be the interface exposed to applications. Asking that applications use the setstripe ioctl correctly is just asking for grief. That is moving in exactly the opposite direction of LU-2182, which I thought is what LLNL wanted?

As for the various issues brought up in recent comments:

  • DFID uses LPX64 - fixed in http://review.whamcloud.com/6156 along with other similar issues
  • I'm largely in support of John proposal
  • cleaning up liblustreapi - great
Comment by Robert Read (Inactive) [ 05/May/14 ]

Pretty much by definition the ioctl interface is part of the user interface for the kernel, in addition to other syscalls, /proc, and xattrs. That the current ioctl interface is not a particularly good one (agreed) is a separate issue.

Comment by Andreas Dilger [ 05/May/14 ]

Some of these issues are being fixed by John in LU-4961 and LU-4999 (e.g. http://review.whamcloud.com/10194 and http://review.whamcloud.com/10210).

It would have been better to open a new ticket for the new issues with lustre_idl.h (possibly linking it back to this one if desired) instead of re-opening this old ticket (which was marked closed for previous releases on which those fixes were landed).

Comment by Robert Read (Inactive) [ 05/May/14 ]

Since original problem this ticket raised (that of lustre_idl.h not compiling in user space) has either regressed or wasn't addressed by the patches in the first place, reopening seemed like the right thing to do. But opening a new one is fine, too, if that fits our process better.

Comment by Christopher Morrone [ 05/May/14 ]

I agree with Andreas that I don't want to force normal users to use ioctls. It would certainly be good to be a little more formal with the ioctls, and treat them as the formal interfaces that they are. But they do not consitute adequate application APIs on their own.

As Andreas pointed out, we at LLNL explicitly want to avoid using them in the backend implementation of the Lustre API library. The details are in LU-2182, but the short story is that ioctls are difficult to function ship.

Comment by Christopher Morrone [ 05/May/14 ]

I started LU-5011 to track the new issues that we'll fix for Lustre 2.6.0 (and backport to the stable branches that need them). That ticket is marked the blocker instead of this one, and LU-5011 is also the one now with the the subtickets links.

On the assumption that we will now reclose this one, I made John Hammond the owner of that ticket.

The only thing I didn't do is close this ticket. If all are in agreement, I can do that as well.

Comment by Robert Read (Inactive) [ 06/May/14 ]

I agree, we don't want to force users to use the ioctl interface, but we shouldn't prevent them from using it, either.

Chris, feel free to close this ticket.

Comment by Prakash Surya (Inactive) [ 07/May/14 ]

I for one do NOT want the ioctl interface (at least not the current one) to be the interface exposed to applications.

My previous comment wasn't specific to the ioctl interface; that's only part of the userspace API. All of the system calls we expose constitute the userspace API; read, write, getxattr, ioctl, etc. But I think I'm getting away from the goal of this ticket, so I'll step out now.

Comment by Gerrit Updater [ 10/Dec/15 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/6156/
Subject: LU-1606 misc: clean up DFID related error messages
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1c24b0afdb1b065d5016865d922c89c384bdbde4

Generated at Sat Feb 10 01:18:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.