Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-540

fix interop issues with lustre_user.h

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.1.0
    • Lustre 2.0.0, Lustre 2.1.0
    • None
    • 3
    • 4924

    Description

      While looking at the Lustre public headers for LU-533 I noticed several
      disturbing/sad incompatibilities in the headers between b1_8 and master.
      Some of them are non-critical:

      • ioctl numbers for lloop devices changed (LL_IOC_LLOOP_ATTACH,
        LL_IOC_LLOOP_DETACH, LL_IOC_LLOOP_INFO, LL_IOC_LLOOP_DETACH_BYDEV

      Since lloop isn't supported and usually ships with matching lctl anyway,
      there isn't a critical interop issue to be fixed, though the LLOOP_INFO
      ioctl should be fixed to report FIDs instead of inode numbers.

      Several potentially more difficult problems exist, that I believe some
      users may be impacted by:

      • struct if_quotactl has changed in size, with fields added in the middle
        of the structure instead of at the end. Some users (at least NERSC)
        are using the llapi_quotactl() interface to access quota data from
        userspace, but the userspace will break due to the change in the ioctl
        interface and llapi_quotactl() parameter. The ioctl number (correctly)
        depends on the size of struct if_quotactl, so it would be possible to
        define two slightly different ioctls at the same time (one for the old
        struct and one for the new struct). Since liblustreapi.a is a static
        library, it will continue to call the kernel with the old ioctl number
        until it is recompiled. It would be possible to define the old ioctl
        number and copy the struct data, if we wanted to maintain compatibility.

      It is not easy to resolve interoperability issues between 1.x user space
      tools and 2.x client kernel, even if the new added fields for "if_quotactl"
      are at the end of the structure. If without considering released lustre-2.0,
      we can adjust current lustre-2.1 to make it be compatible with 1.x userspace
      tools. But I do not sure whether lustre-2.0 client can be ignored or not.

      • the supplementary group downcall structure has changed in 2.0, but very
        sadly the magic number in the struct wasn't also changed, so the kernel
        can't distinguish between the old and new downcall structs. I recall
        at least Sandia had written their own 1.x group upcall, and while we
        may not need to keep compatibility, at least changing the magic number
        would afford us the option to do so. An open question is whether this
        would negatively affect Kerberos users?

      Since the downcall structure sizes for 1.x and 2.x are different, it can
      be used by MDT to distinguish whether the downcall from user space is 2.x
      based or 1.x based. We can process that in current lustre-2.1 candidate.

      • llite file flags are defined with conflicting values for b1_8 and master
        since 1.8.2:

      b1_8:
      #define LL_FILE_LOCKED_DIRECTIO 0x00000008 /* client-side locks with dio */
      #define LL_FILE_LOCKLESS_IO 0x00000010 /* server-side locks with cio */

      master:
      #define LL_FILE_RMTACL 0x00000008

      I have no idea what LL_FILE_RMTACL does, or if anybody uses it. The
      LL_FILE_LOCK* flags are (AFAIK) used by LLNL BG/P IO daemons to force
      lockless IO on files. I'm not sure if apps are using this also.

      "LL_FILE_RMTACL" is used for remote client ACL processing. Since no released
      versions support remote client, we can redefine it in lustre-2.1 candidate.

      Attachments

        Issue Links

          Activity

            [LU-540] fix interop issues with lustre_user.h

            Patches were landed for 2.1.0.

            adilger Andreas Dilger added a comment - Patches were landed for 2.1.0.

            I think we have done what we want.

            yong.fan nasf (Inactive) added a comment - I think we have done what we want.

            Is there still unfinished work with this bug, or did it all get landed?

            adilger Andreas Dilger added a comment - Is there still unfinished work with this bug, or did it all get landed?
            green Oleg Drokin added a comment -

            Ok, it turns out I spoke too soon

            Teh perms file cannot add any more groups. And it is invalid.
            And that's the problem.

            Where the old l_getidentity just ignored invalid perms file and fed the group data to the kernel,
            the new one just bails out on wrong perms file and exits without trying to pass any data to the kernel whatsoever.

            green Oleg Drokin added a comment - Ok, it turns out I spoke too soon Teh perms file cannot add any more groups. And it is invalid. And that's the problem. Where the old l_getidentity just ignored invalid perms file and fed the group data to the kernel, the new one just bails out on wrong perms file and exits without trying to pass any data to the kernel whatsoever.
            green Oleg Drokin added a comment -

            James shared the perms file with me, but unfortunately it seems to be 1.8 based or otherwise invalid.

            Anyway I was inspecting updated l_getidentity code and I think I see the problem in perms handling:

            + size = offsetof(struct identity_downcall_data,
            + idd_groups[data->idd_ngroups]);
            /* read permission database */

            • perms_fp = fopen(PERM_PATHNAME, "r");
            • if (perms_fp) { - get_perms(perms_fp, data); - fclose(perms_fp); - }

              else if (errno != ENOENT)

              { - errlog("open %s failed: %s\n", - PERM_PATHNAME, strerror(errno)); - }

              + rc = get_perms(data);

            later on we proceed to write size bytes to kernel.

            Now the problem is if get_perms() call added any more group entries and increased the data->idd_ngroups value, the size won't be updated and we would never pass these extra entries to the kernel.
            As such I propose to move the size calculation to after the get_perms() call, that should fix the issue.

            green Oleg Drokin added a comment - James shared the perms file with me, but unfortunately it seems to be 1.8 based or otherwise invalid. Anyway I was inspecting updated l_getidentity code and I think I see the problem in perms handling: + size = offsetof(struct identity_downcall_data, + idd_groups [data->idd_ngroups] ); /* read permission database */ perms_fp = fopen(PERM_PATHNAME, "r"); if (perms_fp) { - get_perms(perms_fp, data); - fclose(perms_fp); - } else if (errno != ENOENT) { - errlog("open %s failed: %s\n", - PERM_PATHNAME, strerror(errno)); - } + rc = get_perms(data); later on we proceed to write size bytes to kernel. Now the problem is if get_perms() call added any more group entries and increased the data->idd_ngroups value, the size won't be updated and we would never pass these extra entries to the kernel. As such I propose to move the size calculation to after the get_perms() call, that should fix the issue.

            The old l_getidentity and new one process /etc/lustre/perm with the same behavior. So I do not think such file can cause the failure.

            What is error reported/message when you use new version Lustre?

            yong.fan nasf (Inactive) added a comment - The old l_getidentity and new one process /etc/lustre/perm with the same behavior. So I do not think such file can cause the failure. What is error reported/message when you use new version Lustre?

            Yes that is the one and yes I did update the l_getidenttity. We don't need a special utiltiy for upcall so we use what is built by lustre. What is different is that we supply a /etc/lustre/perm file

            simmonsja James A Simmons added a comment - Yes that is the one and yes I did update the l_getidenttity. We don't need a special utiltiy for upcall so we use what is built by lustre. What is different is that we supply a /etc/lustre/perm file

            > Just test the lastest code drop on our test bed at the lab. The file system mounted but couldn't access it. I did a git bisect and reverted this patch and it worked again.

            Which patch you revert? the "misc patch for user identity upcall/downcall"? If so, please update the MDS-side user space utils "l_getidentity" with latest version, usually it is under /usr/sbin/l_getidentity.

            On the other hand, please show me the dmesg on MDS side if possible. Thanks!

            yong.fan nasf (Inactive) added a comment - > Just test the lastest code drop on our test bed at the lab. The file system mounted but couldn't access it. I did a git bisect and reverted this patch and it worked again. Which patch you revert? the "misc patch for user identity upcall/downcall"? If so, please update the MDS-side user space utils "l_getidentity" with latest version, usually it is under /usr/sbin/l_getidentity. On the other hand, please show me the dmesg on MDS side if possible. Thanks!

            Just test the lastest code drop on our test bed at the lab. The file system mounted but couldn't access it. I did a git bisect and reverted this patch and it worked again.

            simmonsja James A Simmons added a comment - Just test the lastest code drop on our test bed at the lab. The file system mounted but couldn't access it. I did a git bisect and reverted this patch and it worked again.

            Integrated in lustre-master » i686,server,el5,inkernel #248
            LU-540 misc patch for user identity upcall/downcall

            Oleg Drokin : 4b7cbec396fcd7afb81d601a2facb70ee8c7ad28
            Files :

            • lustre/include/lustre/lustre_user.h
            • lustre/mdt/mdt_identity.c
            • lustre/utils/l_getidentity.c
            • lustre/mdt/mdt_lproc.c
            hudson Build Master (Inactive) added a comment - Integrated in lustre-master » i686,server,el5,inkernel #248 LU-540 misc patch for user identity upcall/downcall Oleg Drokin : 4b7cbec396fcd7afb81d601a2facb70ee8c7ad28 Files : lustre/include/lustre/lustre_user.h lustre/mdt/mdt_identity.c lustre/utils/l_getidentity.c lustre/mdt/mdt_lproc.c

            People

              yong.fan nasf (Inactive)
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: