Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19556

fsync() does not need to wait for layout flush

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Medium
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When vvp_io_write_start() calls generic_write_sync(), it may trigger
      ll_fsync() while still holding an active IO context from the parent
      write operation. This creates a deadlock scenario where fsync
      operations attempt to acquire layout locks unnecessarily.

      Here is how this happens:
      1. ll_file_io_generic() (write) increments lo_active_ios
      2. vvp_io_write_start() calls generic_write_sync() -> ll_fsync()
      3. ll_fsync() -> vvp_io_init() -> ll_layout_refresh()
      4. ll_layout_refresh() waits for lo_active_ios == 0
      5. Deadlock: fsync waits for the write operation that called it

      fsync operations should not acquire layout locks because:
      1. fsync only flushes already-mapped pages to disk
      2. The mapping from file offsets to OST objects already occurred
      during the original write operations that populated the page cache
      3. Layout changes should only affect NEW writes, not existing cached
      data
      4. When layout changes occur, OSTs take full extent locks to flush
      client cache before the layout change completes

      This change aligns with the principle that layout changes affect future
      operations, not past operations already in the page cache. By
      preventing unnecessary layout lock acquisition during fsync, we
      eliminate the deadlock while maintaining proper filesystem semantics.

      Attachments

        Activity

          People

            ablagodarenko Artem Blagodarenko
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: