Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10681

Disable tiny writes for O_APPEND

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.11.0
    • Lustre 2.11.0
    • 3
    • 9223372036854775807

    Description

      Unfortunately, tiny writes will not work correctly with O_APPEND. In short, this is because O_APPEND depends on LDLM locking to EOF (on all stripes/components) to protect the file size, but tiny writes requires only that the page we are writing to be locked.

      This means the LDLM lock on a stripe not containing this page could be granted to another client and that client could extend the file without revoking our lock.

      The simplest example is in a multiple stripes situation, but it could happen in a single stripe situation.

      Client 1 writes to a page on stripe 1, dirtying it. File size is, for example, 2K. (not O_APPEND)
      Client 2 writes to some part of the file on stripe 2, file size is now 1 MB + 2K. (not O_APPEND)
      Client 1 does an O_APPEND write of 1K. The tiny writes code notices the page at expected file size is present, locks the full file locally, checks the size, notices it is 2K (because that's the locally known size) and writes there.

      Data is not in the correct location. Ouch.

      Two possible fixes:
      1. Don't do tiny writes with O_APPEND
      2. Do some sort of glimpse before every write
      ^-- This almost certainly removes the point of doing tiny writes, because it would be so slow. Better to do normal writes and take the full file locks required.

      One further thought:
      We could arrange for the size update code for tiny writes to check if the client has LDLM locks to EOF. If it is verified to have them after the start of the tiny write, we can be sure the file size on our client is up to date, or at least that it is not older than the start of the current write. This would still be faster than the normal write path, but much less than normal tiny writes.

      I'll submit a patch. I will probably stick with the simplest solution - Don't do tiny writes for O_APPEND. I will explore the test locking option, but that may be a little complicated, and the more complicated the tiny writes code gets, the less benefit it has.

      Note this problem is likely to be hit in the real world because O_APPEND writes are very often small.

      Attachments

        Issue Links

          Activity

            People

              paf Patrick Farrell
              paf Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: