Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4209

O_LOV_DELAY_CREATE conflict with __O_TMPFILE

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0, Lustre 2.5.1
    • None
    • On 3.11+ kernels
    • 3
    • 11443

    Description

      Since 3.11 kernel release (commit 60545d0d and bb458c64), kernel introduced __O_TMPFILE that unfortunately conflicts with O_LOV_DELAY_CREATE and causes lfs setstripe api to always fail.

      #define __O_TMPFILE 020000000

      #define O_LOV_DELAY_CREATE 0120000000

      Also kernel introduced a safe guard around O_TMPFILE such that whenever __O_TMPFILE bit is set, it has to also have O_DIRECTORY. As a result, it makes Lustre unable to keep backward compatibility. So I am in fact unsure how to solve this. Do we care about O_LOV_DELAY_CREATE compatibility?

      Because this is mostly related to user space ABI, I created this ticket so that we can solve it in Lustre tree first, and then I can port the change to kernel client.

      Attachments

        Issue Links

          Activity

            [LU-4209] O_LOV_DELAY_CREATE conflict with __O_TMPFILE

            Patch landed to Master. Will land in upcoming Releases as well.

            jlevi Jodi Levi (Inactive) added a comment - Patch landed to Master. Will land in upcoming Releases as well.

            Patch has been submitted upstream to Greg KH and linux-fsdevel. Discussion is underway.

            adilger Andreas Dilger added a comment - Patch has been submitted upstream to Greg KH and linux-fsdevel. Discussion is underway.

            Hello,
            I can confirm it works for us on a 3.12.9 kernel using https://github.com/verygreen/linux/tree/lustre-next as source for our DKMS lustre client driver (against 2.6.32/2.4.2 cluster).
            Best

            cdufour Cédric Dufour added a comment - Hello, I can confirm it works for us on a 3.12.9 kernel using https://github.com/verygreen/linux/tree/lustre-next as source for our DKMS lustre client driver (against 2.6.32/2.4.2 cluster). Best

            I'm just testing the updated patch on my home system (slower turnaround than running it under autotest since I rarely build my own kernel) before I submit it upstream.

            adilger Andreas Dilger added a comment - I'm just testing the updated patch on my home system (slower turnaround than running it under autotest since I rarely build my own kernel) before I submit it upstream.

            Discussion from the patch in Gerrit between Bobijam and I:

            I observed that the previous test failure due to that VFS passes O_NOCTTY (0104501) down to ll_file_open() even when it tries to create a file.

            Do you know where the O_NOCTTY flag is coming from? Is it glibc or something? I don't see it in the kernel. I guess with flags 0104501 then none of the other old flags are making it through the VFS.

            The only other idea I have is for setstripe to create a file with mknod() and then setxattr() on the file before opening it. This increases the RPC count but avoids the need for a special flag.

            strace touch /mnt/lustre/file

            shows

            ... open("/mnt/lustre/file", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 ...

            "echo string > lustre_file" or "cat oldfile > lustre_file" would call ll_file_open() with flags as 0101101

            Any ideas on how to solve this problem cleanly?

            We could check for the absence of O_NODELAY, but that isn't very robust if O_NOCTTY is used by some other application (googling showed O_NOCTTY used by a few apps). I figured O_NOCTTY would only be used by applications that know they are opening a tty device, but I guess it is also used by apps that might open random files and want to be safe in case they do open a tty device.

            A similar mechanism would be to check a combination of flags. The old 0100000000 flag is used by FMODE_NONOTIFY, which is masked off by the kernel 2.6.36+ before Lustre has a chance to see it (LU-812, http://review.whamcloud.com/3779).

            There is also the option of actually trying to get a reserved flag in the upstream kernel, but I expect a reply like "you don't need that, system calls are cheap" that isn't really helpful for network filesystems where system calls are expensive.

            I'm wondering if we could use an unlikely combination of flags to be O_LOV_DELAY_CREATE? Something like (O_NOCTTY | FASYNC)? The FASYNC flag doesn't appear to be used very commonly, but I'm not totally sure if it is 100% safe. I updated the patch to try this out at least.

            adilger Andreas Dilger added a comment - Discussion from the patch in Gerrit between Bobijam and I: I observed that the previous test failure due to that VFS passes O_NOCTTY (0104501) down to ll_file_open() even when it tries to create a file. Do you know where the O_NOCTTY flag is coming from? Is it glibc or something? I don't see it in the kernel. I guess with flags 0104501 then none of the other old flags are making it through the VFS. The only other idea I have is for setstripe to create a file with mknod() and then setxattr() on the file before opening it. This increases the RPC count but avoids the need for a special flag. strace touch /mnt/lustre/file shows ... open("/mnt/lustre/file", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3 ... "echo string > lustre_file" or "cat oldfile > lustre_file" would call ll_file_open() with flags as 0101101 Any ideas on how to solve this problem cleanly? We could check for the absence of O_NODELAY, but that isn't very robust if O_NOCTTY is used by some other application (googling showed O_NOCTTY used by a few apps). I figured O_NOCTTY would only be used by applications that know they are opening a tty device, but I guess it is also used by apps that might open random files and want to be safe in case they do open a tty device. A similar mechanism would be to check a combination of flags. The old 0100000000 flag is used by FMODE_NONOTIFY, which is masked off by the kernel 2.6.36+ before Lustre has a chance to see it ( LU-812 , http://review.whamcloud.com/3779 ). There is also the option of actually trying to get a reserved flag in the upstream kernel, but I expect a reply like "you don't need that, system calls are cheap" that isn't really helpful for network filesystems where system calls are expensive. I'm wondering if we could use an unlikely combination of flags to be O_LOV_DELAY_CREATE? Something like (O_NOCTTY | FASYNC)? The FASYNC flag doesn't appear to be used very commonly, but I'm not totally sure if it is 100% safe. I updated the patch to try this out at least.

            Andreas, The patch applies cleanly into b2_4 and b2_5, so I think a simple cherry-pick will work fine there. b2_1 does need a back port, but it's not a big deal. Would prefer to wait until the master version completes test and lands so I can annotate the commit header properly for b2_1.

            bogl Bob Glossman (Inactive) added a comment - Andreas, The patch applies cleanly into b2_4 and b2_5, so I think a simple cherry-pick will work fine there. b2_1 does need a back port, but it's not a big deal. Would prefer to wait until the master version completes test and lands so I can annotate the commit header properly for b2_1.

            People

              adilger Andreas Dilger
              bergwolf Peng Tao
              Votes:
              0 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: