Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
None
-
None
-
9223372036854775807
Description
There are two separate changes could be done to improve this situation in the future instead of the MDT being taken offline and waiting for a full OI rebuild to finish:
- handling this IAM error more gracefully, by resetting the IAM block with the corrupt magic, and maybe scanning the rest of the IAM file to recover any unlinked IAM blocks (but this may not be better than just rebuilding the whole IAM file, together with the next option). Then triggering an internal OI Scrub to re-insert any missing FIDs into the existing OI file. That should be done under LU-12265.
- have the "resetoi" code save a backup of the OI files (eg. oi.16.N.bak) to do FID->inode lookups that are missing from the new OI file, while the new OI files are being rebuilt. That would allow most of the FID lookups to finish with the old OI during the rebuild (though not all, if it had some error). The OI backups would be deleted after the OI Scrub is finished.
Once these functions are implemented separately, then it should be possible to combine them, and add an osd-ldiskfs.*.resetoi=N parameter can trigger "rename oi.16.N to oi.16.N.bak and rebuild" transparently to the running system.
Attachments
Issue Links
- is related to
-
LU-12265 LustreError: 141027:0:(osd_iam_lfix.c:188:iam_lfix_init()) Bad magic in node 1861726 #34: 0xcc != 0x1976 or bad cnt: 0 170: rc = -5
- Reopened
- is related to
-
LU-12268 LDISKFS-fs error: ldiskfs_find_dest_de:2066: bad entry in directory: rec_len is smaller than minimal - offset=0( 0), inode=201, rec_len=0, name_len=0
- Resolved