Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4209

O_LOV_DELAY_CREATE conflict with __O_TMPFILE

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.1
    • None
    • On 3.11+ kernels
    • 3
    • 11443

    Description

      Since 3.11 kernel release (commit 60545d0d and bb458c64), kernel introduced __O_TMPFILE that unfortunately conflicts with O_LOV_DELAY_CREATE and causes lfs setstripe api to always fail.

      #define __O_TMPFILE 020000000

      #define O_LOV_DELAY_CREATE 0120000000

      Also kernel introduced a safe guard around O_TMPFILE such that whenever __O_TMPFILE bit is set, it has to also have O_DIRECTORY. As a result, it makes Lustre unable to keep backward compatibility. So I am in fact unsure how to solve this. Do we care about O_LOV_DELAY_CREATE compatibility?

      Because this is mostly related to user space ABI, I created this ticket so that we can solve it in Lustre tree first, and then I can port the change to kernel client.

      Attachments

        Issue Links

          Activity

            [LU-4209] O_LOV_DELAY_CREATE conflict with __O_TMPFILE

            Bob, could you please try applying this patch to b2_4, b2_5, and b2_1. If it doesn't cherry-pick cleanly, please submit an updated patch.

            adilger Andreas Dilger added a comment - Bob, could you please try applying this patch to b2_4, b2_5, and b2_1. If it doesn't cherry-pick cleanly, please submit an updated patch.

            I pushed http://review.whamcloud.com/8312 for master using O_NOCTTY, but I haven't had time to test it. Ideally, this should be tested by building a statically-linked "lfs" (which I think is the default anyway) on a RHEL kernel and then running that on a new 3.11 kernel with O_TMPFILE defined.

            This should also be landed to b2_4 and b2_1 so that applications statically linked on those kernels will be able to upgrade to newer kernels without needing to be recompiled.

            adilger Andreas Dilger added a comment - I pushed http://review.whamcloud.com/8312 for master using O_NOCTTY, but I haven't had time to test it. Ideally, this should be tested by building a statically-linked "lfs" (which I think is the default anyway) on a RHEL kernel and then running that on a new 3.11 kernel with O_TMPFILE defined. This should also be landed to b2_4 and b2_1 so that applications statically linked on those kernels will be able to upgrade to newer kernels without needing to be recompiled.

            I don't think it is a good idea to always translate O_ACCMODE into O_RDWR, since there are semantics on the MDS that may depend on O_RDONLY vs. O_WRONLY vs. O_RDWR (e.g. if an HSM file is restored from tape, etc). Is it possible to upgrade an O_ACCMODE open to the passed open mode (O_RDWR, O_WRONLY, O_RDONLY) via fcntl() or similar? It looks like that is not possible, according to the fcntl(2) man page since F_SETFL ignores the file access mode.

            Another possibility (a bit more ugly) is to overload the use of O_NOCTTY to mean O_LOV_DELAY_CREATE? This would at least conflict with a stable flag value that will not have side effects for most cases (char TTY devices excluded I think).

            adilger Andreas Dilger added a comment - I don't think it is a good idea to always translate O_ACCMODE into O_RDWR, since there are semantics on the MDS that may depend on O_RDONLY vs. O_WRONLY vs. O_RDWR (e.g. if an HSM file is restored from tape, etc). Is it possible to upgrade an O_ACCMODE open to the passed open mode (O_RDWR, O_WRONLY, O_RDONLY) via fcntl() or similar? It looks like that is not possible, according to the fcntl(2) man page since F_SETFL ignores the file access mode. Another possibility (a bit more ugly) is to overload the use of O_NOCTTY to mean O_LOV_DELAY_CREATE? This would at least conflict with a stable flag value that will not have side effects for most cases (char TTY devices excluded I think).
            bergwolf Peng Tao added a comment -

            One issue with O_ACCMODE is that we'll lose open flags information about whether file is opened with O_WRONLY or O_RDWR. Looking at Lustre utils, internally it always call O_LOV_DELAY_CREATE with O_WRONLY. However, there is llapi_file_open() that user space might call with O_RDWR. Is it OK to just translate O_ACCMODE to O_RDWR?

            bergwolf Peng Tao added a comment - One issue with O_ACCMODE is that we'll lose open flags information about whether file is opened with O_WRONLY or O_RDWR. Looking at Lustre utils, internally it always call O_LOV_DELAY_CREATE with O_WRONLY. However, there is llapi_file_open() that user space might call with O_RDWR. Is it OK to just translate O_ACCMODE to O_RDWR?

            Actually, it is O_ACCMODE that I was thinking about, not O_PATH. It is used in blkdev_open() to set FMODE_WRITE_IOCTL for calling ioctl() on block devices without actually accessing the device. We could use O_ACCMODE in place of (or in addition to) O_LOV_DELAY_CREATE on new kernels. It was added in kernel v2.6.27-6511-g86d434d, so it will be in all of the supported Lustre client kernels.

            This is mostly a client-side issue that can addressed by allowing either O_LOV_DELAY_CREATE or O_ACCMODE to set MDS_OPEN_DELAY_CREATE on the wire. O_LOV_DELAY_CREATE was changed in LU-812 (v2_3_50_0-42-gae9ad4d) to 0120000000 to avoid conflict with 2.6.36 kernel FMODE_NONOTIFY. By moving to a standard flag like O_ACCMODE we can avoid conflicts in the future.

            adilger Andreas Dilger added a comment - Actually, it is O_ACCMODE that I was thinking about, not O_PATH. It is used in blkdev_open() to set FMODE_WRITE_IOCTL for calling ioctl() on block devices without actually accessing the device. We could use O_ACCMODE in place of (or in addition to) O_LOV_DELAY_CREATE on new kernels. It was added in kernel v2.6.27-6511-g86d434d, so it will be in all of the supported Lustre client kernels. This is mostly a client-side issue that can addressed by allowing either O_LOV_DELAY_CREATE or O_ACCMODE to set MDS_OPEN_DELAY_CREATE on the wire. O_LOV_DELAY_CREATE was changed in LU-812 (v2_3_50_0-42-gae9ad4d) to 0120000000 to avoid conflict with 2.6.36 kernel FMODE_NONOTIFY. By moving to a standard flag like O_ACCMODE we can avoid conflicts in the future.

            Sadly, I thought the API for this was going to be changed from an open(O_TMPFILE) flag to being a new syscall tmpfile(). I guess that never happened, even though it was an option that seemed to be preferred by many of the maintainers.

            For newer versions of Lustre, it might be possible to drop the use of O_LOV_DELAY_CREATE and instead just use mknod() + setxattr() before opening the file to change the layout of a file, but that would cause at least one extra RPC per file create, and may not be atomic w.r.t. other threads opening the file at the same time.

            I also looked at "O_PATH" to see if this allows opening the file locally without actually sending any RPC, but I don't think it is possible to even call ioctl() on such a file. I thought that there was some method to do this for device files (e.g. opening with "mode = 0") to allow ioctl() to be called for floppy devices or whatever, but I can't find it.

            Oleg, any ideas about how to handle this?

            adilger Andreas Dilger added a comment - Sadly, I thought the API for this was going to be changed from an open(O_TMPFILE) flag to being a new syscall tmpfile(). I guess that never happened, even though it was an option that seemed to be preferred by many of the maintainers. For newer versions of Lustre, it might be possible to drop the use of O_LOV_DELAY_CREATE and instead just use mknod() + setxattr() before opening the file to change the layout of a file, but that would cause at least one extra RPC per file create, and may not be atomic w.r.t. other threads opening the file at the same time. I also looked at "O_PATH" to see if this allows opening the file locally without actually sending any RPC, but I don't think it is possible to even call ioctl() on such a file. I thought that there was some method to do this for device files (e.g. opening with "mode = 0") to allow ioctl() to be called for floppy devices or whatever, but I can't find it. Oleg, any ideas about how to handle this?

            People

              adilger Andreas Dilger
              bergwolf Peng Tao
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: