[LU-13416] Data corruption during IOR testing with DoM files and hard failover - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.14.0, Lustre 2.12.5
Affects Version/s: None
Labels:
None
Environment:
Any Lustre 2.x affected.

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

IAM tables uses a zero copy update for files as similar as ldiskfs directories does.
osd-ldiskfs staring from

# git describe 67076c3c7e2b11023b943db2f5031d9b9a11329c
v2_2_50_0-22-g67076c3

does same. But it's not a safe without set a LDISKFS_INODE_JOURNAL_DATA to inodes.
(thanks bzzz for tip).
Otherwise metadata blocks can be reused before journal checkpoint without corresponded revoke records. It caused a valid file data will replaced with stale journaled data.
from blk trace perspective it shown

    mdt_io01_025-32148 [003]  4161.223760: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_025]
    mdt_io01_019-31765 [003]  4163.374449: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_019]
    mdt_io01_000-12006 [014]  4165.256635: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_000]
    mdt_io01_019-31765 [004]  4167.030265: block_bio_queue:      9,65 W 12075997800 + 8 [mdt_io01_019]

but this info is committed

00000001:00080000:9.0:1585615546.198190:0:11825:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) snx11281-MDT0000: transno 4522600752066 is committed
00000001:00080000:9.0:1585615546.198196:0:11825:0:(tgt_lastrcvd.c:902:tgt_cb_last_committed()) snx11281-MDT0000: transno 4522600752064 is committed

but after crash, journal records is

Commit time 1585612866.905807896
  FS block 1509499725 logged at journal block 1370 (flags 0x2)
Found expected sequence 86453863, type 2 (commit block) at block 1382
Commit time 1585612871.80796396
Found expected sequence 86453864, type 2 (commit block) at block 1395
Commit time 1585612871.147796211
  FS block 1509499725 logged at journal block 1408 (flags 0x2)
Found expected sequence 86453865, type 2 (commit block) at block 1414
Commit time 1585612872.386792798
  FS block 1509499725 logged at journal block 1427 (flags 0x2)
Found expected sequence 86453866, type 2 (commit block) at block 1438
Commit time 1585612876.763804361
Found expected sequence 86453867, type 2 (commit block) at block 1451
Commit time 1585612876.834804666
  FS block 1509499725 logged at journal block 1464 (flags 0x2)
Found expected sequence 86453868, type 2 (commit block) at block 1471

and none revoke records.

Attachments

Issue Links

is related to

LU-14267 osd_ldiskfs_write_record(): do not update in-bh inode every time

Resolved

Activity

People

Assignee:: Alexey Lyashkov

Reporter:: Alexey Lyashkov

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 06/Apr/20 4:20 PM

Updated:: 23/Dec/20 9:46 AM

Resolved:: 20/May/20 1:36 PM