Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.15.3
-
3
-
9223372036854775807
Description
I noticed that sometimes sanity-flr/200 hits "checksum error", here are some findings.
first of all, checksum error is caused by incomplete preceding lfs mirror resync command (which doesn't return an error in some cases).
in turn, EIO lfs hits is caused by AS_EIO flag on the corresponded mapping.
AS_EIO is set because of ESTALE to OST_WRITE with incorrect layout version (client's version is smaller than one on OST).
so far I've traced all this to the race between two processes:
- lfs doing resync and changing layout generation
- another process (say, multiop) doing regular write
I will cite the logs in a subsequent comment.
Attachments
Issue Links
- is duplicated by
-
LU-14966 sanity-flr test_200: FAIL: checksum error for mirror 2: lfs mirror: '/mnt/lustre/f200.sanity-flr' llapi_mirror_resync_many: Input/output error
- Resolved
- is related to
-
LU-18476 interop replay-single test_202: FAIL: layout gen changed: 2 -> 0
- Open
-
LU-12656 sanity-flr test 200 fails with 'failed writing to *:*’
- Resolved
-
LU-17070 sanity-flr test_200b: vvp_vmpage_error()) LBUG
- Resolved
-
LU-15269 sanity-flr/200 to generate new tmp files each time
- Resolved
- is related to
-
LU-18416 Data corruption/miscompare observed during 48hr FOFB
- Resolved
- mentioned in
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...
-
Page Loading...