[LU-4523] Need explanation for FS corruption - ldiskfs_mb_free_metadata: Double free of blocks Created: 21/Jan/14 Updated: 11/Feb/14 Resolved: 11/Feb/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oz Rentas | Assignee: | Hongchao Zhang |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 12367 |
| Description |
|
The customer, Yale, encountered file system corruption on one of their OST devices, dm-20 which is "scratch-OST0028". Customer fan e2fsck on that device, which fixed the corruption, but now they would like to have a RCA to prevent it from happening in future. The corruption was first reported on Jan-11, but there aren't any irregular events on the storage side that would have caused such corruption, which could indicate the corruption happened sometime before and was only reported on the 11th. Jan 11 11:54:33 oss7 kernel: Lustre: 2916:0:(o2iblnd_cb.c:2249:kiblnd_passive_connect()) Conn stale 10.191.133.6@o2ib [old ver: 12, new ver: 12] FSCK output: Running additional passes to resolve blocks claimed by more than one inode... What other debugging or data can be pulled to explain the problem? Thanks, |
| Comments |
| Comment by Peter Jones [ 21/Jan/14 ] |
|
Oz We'll certainly need details about which Lustre version is in use and some logs - dmesg from this node (or syslog) as a start, say 24 hours into the past. Thanks Peter |
| Comment by Oz Rentas [ 21/Jan/14 ] |
|
Ah, yes, of course. Sorry about that. I've attached the missing files. Lustre: 1.8.9 |
| Comment by Peter Jones [ 22/Jan/14 ] |
|
Hongchao has been looking at this information |
| Comment by Peter Jones [ 24/Jan/14 ] |
|
Hongchao As per our recent discussion on this topic I understand that you believe this issue to be a duplication of Do I have this right? Is there anything to add/correct? Thanks Peter |
| Comment by Hongchao Zhang [ 26/Jan/14 ] |
|
Hi Peter, Yes, it could be problem related to the ext4 (patched a little by Lustre and renamed to ldiskfs), there are some similar ticket ( btw, there is a similar issues reported on Redhat, https://access.redhat.com/site/solutions/157393 Thanks |
| Comment by Oz Rentas [ 11/Feb/14 ] |
|
Thank you for this very useful information. It has been passed on to the customer. |
| Comment by Peter Jones [ 11/Feb/14 ] |
|
ok thanks Oz |