[LU-10958] brw rpc reordering causes data corruption when the writethrough cache is disabled - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.14.0
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

We ran IOR with LNet router failure simulation and encountered data corruption which seems to be reproducible on master.

The following scenario happens:

a client thread writes some data to page N of file X
page N is transfered to the OSS, the processing thread sleeps somewhere
the original BRW request timeouts and the client resends page N
page N is successfully written to disk, the client receives the reply and clears PG_Writeback
a client thread writes different data to the same page N of file X
page N with the new data is successfully written to disk, the client receives the reply and clears PG_Writeback
the OSS thread from step 2 wakes up and writes stale data to disk [data corruption]

A reproducer will be uploaded shortly.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

134.tar.bz2
1.62 MB
26/Apr/18 3:58 PM

Activity

[LU-10958] brw rpc reordering causes data corruption when the writethrough cache is disabled

Etienne Aujames added a comment - 11/Feb/21 2:38 PM

Hello,

Is a backport planned for the b2_12 branch for this issue ?

Etienne Aujames added a comment - 11/Feb/21 2:38 PM Hello, Is a backport planned for the b2_12 branch for this issue ?

Peter Jones added a comment - 08/Feb/21 9:59 PM

Landed for 2.14

Peter Jones added a comment - 08/Feb/21 9:59 PM Landed for 2.14

Gerrit Updater added a comment - 08/Feb/21 9:54 PM

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32281/
Subject: ~~LU-10958~~ ofd: data corruption due to RPC reordering
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 35679a730bf0b7a8d4ce84cadc3ecc7c289ef491

Gerrit Updater added a comment - 08/Feb/21 9:54 PM Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32281/ Subject: LU-10958 ofd: data corruption due to RPC reordering Project: fs/lustre-release Branch: master Current Patch Set: Commit: 35679a730bf0b7a8d4ce84cadc3ecc7c289ef491

Cory Spitz added a comment - 04/Feb/21 10:18 PM

Proposed for 2.14.0. With -RC1 already available, I realize that its candidacy might not hold.

Cory Spitz added a comment - 04/Feb/21 10:18 PM Proposed for 2.14.0. With -RC1 already available, I realize that its candidacy might not hold.

Andrew Perepechko added a comment - 09/Aug/18 5:09 PM - edited

No, I don't think so. There's nothing wrong in the md layer except the delay itself which makes it possible for the resent RPC and the RPC after it to complete before the initial delayed RPC. This delay is the analogue of OBD_FAIL_OST_BRW_PAUSE_BULK2 from https://review.whamcloud.com/#/c/32165/6/lustre/tests/recovery-small.sh

Delay can happen anywhere on a non-RTOS system.

Andrew Perepechko added a comment - 09/Aug/18 5:09 PM - edited No, I don't think so. There's nothing wrong in the md layer except the delay itself which makes it possible for the resent RPC and the RPC after it to complete before the initial delayed RPC. This delay is the analogue of OBD_FAIL_OST_BRW_PAUSE_BULK2 from https://review.whamcloud.com/#/c/32165/6/lustre/tests/recovery-small.sh Delay can happen anywhere on a non-RTOS system.

Andreas Dilger added a comment - 09/Aug/18 5:05 PM

In our scenario the real delay happens in the dm/mdraid layer after the bulk transfer succeeded.

Isn't that a problem of the DM/mdraid later that it is reordering writes incorrectly? If the OST thread submit writes to disk as A, A', B, but the disk writes A', B, A because A was blocked in the IO stack, then there isn't much we can do about it.

Andreas Dilger added a comment - 09/Aug/18 5:05 PM In our scenario the real delay happens in the dm/mdraid layer after the bulk transfer succeeded. Isn't that a problem of the DM/mdraid later that it is reordering writes incorrectly? If the OST thread submit writes to disk as A, A', B, but the disk writes A', B, A because A was blocked in the IO stack, then there isn't much we can do about it.

People

Assignee:: Andrew Perepechko

Reporter:: Andrew Perepechko

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 26/Apr/18 3:22 PM

Updated:: 11/Feb/21 2:38 PM

Resolved:: 08/Feb/21 9:59 PM