Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13416

Data corruption during IOR testing with DoM files and hard failover

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.14.0, Lustre 2.12.5
    • None
    • None
    • Any Lustre 2.x affected.
    • 3
    • 9223372036854775807

    Description

      IAM tables uses a zero copy update for files as similar as ldiskfs directories does.
      osd-ldiskfs staring from

      # git describe 67076c3c7e2b11023b943db2f5031d9b9a11329c
      v2_2_50_0-22-g67076c3
      

      does same. But it's not a safe without set a LDISKFS_INODE_JOURNAL_DATA to inodes.
      (thanks bzzz for tip).
      Otherwise metadata blocks can be reused before journal checkpoint without corresponded revoke records. It caused a valid file data will replaced with stale journaled data.
      from blk trace perspective it shown

          mdt_io01_025-32148 [003]  4161.223760: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_025]
          mdt_io01_019-31765 [003]  4163.374449: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_019]
          mdt_io01_000-12006 [014]  4165.256635: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_000]
          mdt_io01_019-31765 [004]  4167.030265: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_019]
      

      but this info is committed

      00000001:00080000:9.0:1585615546.198190:0:11825:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) snx11281-MDT0000: transno 4522600752066 is committed
      00000001:00080000:9.0:1585615546.198196:0:11825:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) snx11281-MDT0000: transno 4522600752064 is committed
      

      but after crash, journal records is

      Commit time 1585612866.905807896
        FS block 1509499725 logged at journal block 1370 (flags 0x2)
      Found expected sequence 86453863, type 2 (commit block) at block 1382
      Commit time 1585612871.80796396
      Found expected sequence 86453864, type 2 (commit block) at block 1395
      Commit time 1585612871.147796211
        FS block 1509499725 logged at journal block 1408 (flags 0x2)
      Found expected sequence 86453865, type 2 (commit block) at block 1414
      Commit time 1585612872.386792798
        FS block 1509499725 logged at journal block 1427 (flags 0x2)
      Found expected sequence 86453866, type 2 (commit block) at block 1438
      Commit time 1585612876.763804361
      Found expected sequence 86453867, type 2 (commit block) at block 1451
      Commit time 1585612876.834804666
        FS block 1509499725 logged at journal block 1464 (flags 0x2)
      Found expected sequence 86453868, type 2 (commit block) at block 1471
      

      and none revoke records.

      Attachments

        Issue Links

          Activity

            People

              shadow Alexey Lyashkov
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: