Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.4.2
-
lustre-2.4.2-14chaos, ZFS OSD
-
3
-
15831
Description
Users have reported several recent cases of file corruption. The corrupt files are larger than expected, and contain all of the original written data plus additional data at the end of the file. The additional data appears to be valid structured user data of unknown origin. We have not found anything unusual in the console logs from clients or servers at the time the files were written.
In one case, the user made a copy of a Lustre directory tree using HPSS archival storage tools, then compared the copy to the original. He found one corrupt file in the copy. The original file size was 2,000,000 bytes, but the copy size was 2,097,152 (2 MiB). The archive tool reported 2,000,000 bytes written. The extra 97,152 bytes appear to be valid structured user data of unknown origin.
The corrupt file sizes are not always aligned on MiB boundaries, however. Of the cases reported so far, these are the sizes involved:
Example # | Expected Size | Actual Size |
---|---|---|
1 | 2000000 | 2097152 |
2 | 1008829 | 2097152 |
3 | 36473 | 1053224 |
4 | 1008829 | 1441432 |
In Example 1, the "bad data" begins immediately at the end of the expected data, with no sparse area between. Seen below with od -A d -a, the expected data is random bytes, whereas from offset 2000000 onward the unexpected data is structured.
1999840 ! esc nul del [ dc3 + b h \ can ; f h dc4 9 1999856 D + U 1 j q g ; 7 J r { " j ) D 1999872 enq * C ` = o C & K \ a 1 D v k ht 1999888 ! A ; ff 2 " G i m 9 e dle $ si T ) 1999904 9 etb nl w bel N rs R * nul eot o v p y can 1999920 1 4 $ c W l M D & 3 U J B ) t { 1999936 A del s I M dc1 esc w dc1 sp g bs dle ` sp A 1999952 nak D % l 1 r 1 W % ack ! h 0 syn c r 1999968 nak W ; b h W Z z w B stx bs " # J 7 1999984 h o $ em b V p bel ] dc2 o cr ) S del > 2000000 sp 1 1 1 9 nl B o x sp f r a c A A 2000016 sp 6 4 0 sp 6 7 9 sp 4 0 sp 7 9 sp 1 2000032 1 0 0 sp 1 1 1 9 nl B o x sp f r a 2000048 c A A sp 6 8 0 sp 7 1 9 sp 4 0 sp 7 2000064 9 sp 1 1 0 0 sp 1 1 1 9 nl B o x sp 2000080 f r a c A A sp 7 2 0 sp 7 5 9 sp 4 2000096 0 sp 7 9 sp 1 1 0 0 sp 1 1 1 9 nl B 2000112 o x sp f r a c A A sp 7 6 0 sp 7 9 2000128 9 sp 4 0 sp 7 9 sp 1 1 0 0 sp 1 1 1 2000144 9 nl B o x sp f r a c A A sp 8 0 0
In examples 2-4, there is a sparse region between the expected and unexpected data, ending on a 1 MiB boundary. Here is another od snippet illustrating the expected, sparse, and unexpected regions for example 2:
1008768 g nl . / t e s t d i r / c h m o 1008784 d s t g nl . / t e s t d i r / l 1008800 s t o r a g e nl . / t e s t f i 1008816 l e 3 nl . / z z j u n k nl nul nul nul 1008832 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul * 1048576 sp 5 9 9 nl B o x sp f r a c A A sp 1048592 2 0 0 sp 2 3 9 sp 2 0 0 sp 2 3 9 sp 1048608 5 8 0 sp 5 9 9 nl B o x sp f r a c 1048624 A A sp 2 4 0 sp 2 7 9 sp 2 0 0 sp 2 1048640 3 9 sp 5 8 0 sp 5 9 9 nl B o x sp f 1048656 r a c A A sp 2 8 0 sp 3 1 9 sp 2 0 1048672 0 sp 2 3 9 sp 5 8 0 sp 5 9 9 nl B o 1048688 x sp f r a c A A sp 3 2 0 sp 3 5 9 1048704 sp 2 0 0 sp 2 3 9 sp 5 8 0 sp 5 9 9 1048720 nl B o x sp f r a c A A sp 3 6 0 sp 1048736 3 9 9 sp 2 0 0 sp 2 3 9 sp 5 8 0 sp 1048752 5 9 9 nl B o x sp f r a c A A sp 4
In all 4 examples the corrupt data resides within the second OST object. In examples 2-4 the file should have only one object. This feels like some bug is causing the second OST object to be doubly linked, and our users have partially overwritten another user's data.
Attachments
Issue Links
- is related to
-
LU-15009 precreate should cleanup orphans upon error
-
- Resolved
-
Well, it seems I have a brute-force reproducer that I believe is the same as what happened to you. It also seems to imply that my reasoning for the bug cause is correct.
To cause the bug I had this patch applied (this is against your chaos tree, but probably will work with master too, except different ofd file will need to be patched):
After that run e.g.
It'll fail with some nonsensical error message about first df failed that I don't want to dig into. The most important part is - when you do ls -l /mnt/lustre/dir1234-2 you'll see that every other file is non-zero size! (why every other? I have no idea t this point as both OSTs should have entered this condition.
When cattign the files we also see stale garbage there.