Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-15016

OI Scrub backup and rebuild

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • None
    • 9223372036854775807

    Description

      There are two separate changes could be done to improve this situation in the future instead of the MDT being taken offline and waiting for a full OI rebuild to finish:

      • handling this IAM error more gracefully, by resetting the IAM block with the corrupt magic, and maybe scanning the rest of the IAM file to recover any unlinked IAM blocks (but this may not be better than just rebuilding the whole IAM file, together with the next option). Then triggering an internal OI Scrub to re-insert any missing FIDs into the existing OI file. That should be done under LU-12265.
      • have the "resetoi" code save a backup of the OI files (eg. oi.16.N.bak) to do FID->inode lookups that are missing from the new OI file, while the new OI files are being rebuilt. That would allow most of the FID lookups to finish with the old OI during the rebuild (though not all, if it had some error). The OI backups would be deleted after the OI Scrub is finished.

      Once these functions are implemented separately, then it should be possible to combine them, and add an osd-ldiskfs.*.resetoi=N parameter can trigger "rename oi.16.N to oi.16.N.bak and rebuild" transparently to the running system.

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: