Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19015

Possible records skipping during changelog processing when an ENOSPC occurred while writing a record

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.17.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Multiple threads get into mdd_changelog_write_rec(), they all get their own offsets and now start dt_record_write() in parallel.
      Due to scheduling one thread with an offset corresponding to block X+1 starts first and allocates block X+1, few more threads now can write to X+1.
      Another thread with block X can't allocate it due to ENOSPC.

      This situation leads to a sparse file for the changelog. When processing, dt_read() will get a zeroed block for an offset that has not been written. And llog_process_thread() will skip an entire chunk (2 blocks), losing changelog records from the second block. I understand that there is a small chance of this happening, but it is still possible that regression could occur after LU-18218.

      Attachments

        Issue Links

          Activity

            [LU-19015] Possible records skipping during changelog processing when an ENOSPC occurred while writing a record

            People

              aboyko Alexander Boyko
              aboyko Alexander Boyko
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: