Affects Version/s: Lustre 2.0.0, Lustre 2.1.0
Fix Version/s: Lustre 2.1.0
While looking at the Lustre public headers for
LU-533I noticed several
disturbing/sad incompatibilities in the headers between b1_8 and master.
Some of them are non-critical:
- ioctl numbers for lloop devices changed (LL_IOC_LLOOP_ATTACH,
LL_IOC_LLOOP_DETACH, LL_IOC_LLOOP_INFO, LL_IOC_LLOOP_DETACH_BYDEV
Since lloop isn't supported and usually ships with matching lctl anyway,
there isn't a critical interop issue to be fixed, though the LLOOP_INFO
ioctl should be fixed to report FIDs instead of inode numbers.
Several potentially more difficult problems exist, that I believe some
users may be impacted by:
- struct if_quotactl has changed in size, with fields added in the middle
of the structure instead of at the end. Some users (at least NERSC)
are using the llapi_quotactl() interface to access quota data from
userspace, but the userspace will break due to the change in the ioctl
interface and llapi_quotactl() parameter. The ioctl number (correctly)
depends on the size of struct if_quotactl, so it would be possible to
define two slightly different ioctls at the same time (one for the old
struct and one for the new struct). Since liblustreapi.a is a static
library, it will continue to call the kernel with the old ioctl number
until it is recompiled. It would be possible to define the old ioctl
number and copy the struct data, if we wanted to maintain compatibility.
It is not easy to resolve interoperability issues between 1.x user space
tools and 2.x client kernel, even if the new added fields for "if_quotactl"
are at the end of the structure. If without considering released lustre-2.0,
we can adjust current lustre-2.1 to make it be compatible with 1.x userspace
tools. But I do not sure whether lustre-2.0 client can be ignored or not.
- the supplementary group downcall structure has changed in 2.0, but very
sadly the magic number in the struct wasn't also changed, so the kernel
can't distinguish between the old and new downcall structs. I recall
at least Sandia had written their own 1.x group upcall, and while we
may not need to keep compatibility, at least changing the magic number
would afford us the option to do so. An open question is whether this
would negatively affect Kerberos users?
Since the downcall structure sizes for 1.x and 2.x are different, it can
be used by MDT to distinguish whether the downcall from user space is 2.x
based or 1.x based. We can process that in current lustre-2.1 candidate.
- llite file flags are defined with conflicting values for b1_8 and master
#define LL_FILE_LOCKED_DIRECTIO 0x00000008 /* client-side locks with dio */
#define LL_FILE_LOCKLESS_IO 0x00000010 /* server-side locks with cio */
#define LL_FILE_RMTACL 0x00000008
I have no idea what LL_FILE_RMTACL does, or if anybody uses it. The
LL_FILE_LOCK* flags are (AFAIK) used by LLNL BG/P IO daemons to force
lockless IO on files. I'm not sure if apps are using this also.
"LL_FILE_RMTACL" is used for remote client ACL processing. Since no released
versions support remote client, we can redefine it in lustre-2.1 candidate.