[LU-5239] Recovery of small files with corrupt objects Created: 20/Jun/14 Updated: 18/Sep/14 Resolved: 18/Sep/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Blake Caldwell | Assignee: | Zhenyu Xu |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
RHEL6.4/distro IB kernel 2.6.32-358.18.1.el6 |
||
| Severity: | 3 |
| Rank (Obsolete): | 14611 |
| Description |
|
We had a backend storage issue on 5/30 that corrupted a number of blocks on the filesystem across different OSTs. Since then we were able to recover all filesystem structures with e2fsck and identify what we though were all the files. Just recently, we discovered a new scenario where inodes were corrupted, as so cleared by e2fsck. We have identified 665 of such files and an ls or stat returns "Cannot allocate memory". Syslog has the error Jun 20 20:53:15 f1-oss1d5 kernel: [853846.084587] LustreError: 14391:0:(ldlm_resource.c:1165:ldlm_resource_get()) f1-OST00bc: lvbo_init failed for resource 0xd4805:0x0: rc = -2 This is expected because object 0xd4805 on f1-OST00bc was is invalid (it's inode on f1-OST00bc was cleared by e2fsck). We would like to attempt recovery of small files <3MB (stripe size 4), where the layout might position the missing object after EOF. We thought a dd if=bad.file of=good.file would return success if EOF was reached before the missing object. However, this method fails with "Cannot allocate memory" even for small files, where dd reports only copying some number of kB. What is causing dd to fail to read even files less than 1MB where the bad object is in the 3rd object? When opening good.out, it is not complete. Is there an alternative method to successfully read to EOF for small files? This is not causing a downtime, but it is desirable to recover these files as quickly as reasonably possible. |
| Comments |
| Comment by Peter Jones [ 21/Jun/14 ] |
|
Bobijam Could you please advise with this one? Thanks Peter |
| Comment by Zhenyu Xu [ 23/Jun/14 ] |
|
How can you be sure that the file EOF before 3M? If you can sure of that, would "dd if=file.F90 of=/tmp/good.out bs=1M count=2" work for it? |
| Comment by Blake Caldwell [ 23/Jun/14 ] |
|
That causes the same error "Cannot allocate memory". By setting bs=1, the whole file (17333 bytes) can be recovered to EOF. WIth a block-size of 4, I could reproduce the error without reading the whole file. Is there an optimization where the client tries to read the next object even if EOF is reached on the first object? gaea9:/tmp # dd if=file.F90 of=/tmp/good.out bs=1 gaea9:/tmp # dd if=file.F90 of=/tmp/good.out bs=4 count=4333 |
| Comment by Zhenyu Xu [ 23/Jun/14 ] |
|
from what you described, file.F90 only has 17333 bytes available to recovered. When block-size is set to 4, dd tries to read 4 bytes at a time, thus it can only succeed to read 4333 times, which covers 4333 * 4 = 17332 bytes, the last read reaches the missing object on OST00bc and it fails. And it also explains the dd command without setting bs parameter, whose default value is 512, in that case it reads 512*33=16996 bytes, and fails to read another 512 bytes which reaches the missing object on OST00bc as well. |
| Comment by Blake Caldwell [ 03/Jul/14 ] |
|
While we were ables to complete recovery of the files with bs=1, we weren't completely clear why reading 17336 bytes (4*4334) would return an error when 17332 bytes is fine. Lustre would have to know that the first object is 17332 bytes and that it needs to read 4 more bytes from the 2nd object. Why would it prefetch the 2nd object in the 17336 bytes case and not the 17332 bytes case? |
| Comment by Zhenyu Xu [ 04/Jul/14 ] |
|
I suspect it could involves with dd implementation, I don't know the detail though, but I guess dd does not even try to understand whether there is EOF in the last 4 bytes read request, and it just asks 4 bytes, and Lustre reaches the unavailable region and returns ENOENT for the request. |
| Comment by Blake Caldwell [ 18/Sep/14 ] |
|
This can be resolved. There is no practical reason to investigate this more and it definitely could be in dd implemenation, and using conv=sync could have helped out with the investigation. We were able to recover about half of the files using this technique (with bs=1) because the cleared block was after EOF. |