Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • None
    • 9223372036854775807

    Description

      There are two separate changes could be done to improve this situation in the future instead of the MDT being taken offline and waiting for a full OI rebuild to finish:

      • handling this IAM error more gracefully, by resetting the IAM block with the corrupt magic, and maybe scanning the rest of the IAM file to recover any unlinked IAM blocks (but this may not be better than just rebuilding the whole IAM file, together with the next option). Then triggering an internal OI Scrub to re-insert any missing FIDs into the existing OI file. That should be done under LU-12265.
      • have the "resetoi" code save a backup of the OI files (eg. oi.16.N.bak) to do FID->inode lookups that are missing from the new OI file, while the new OI files are being rebuilt. That would allow most of the FID lookups to finish with the old OI during the rebuild (though not all, if it had some error). The OI backups would be deleted after the OI Scrub is finished.

      Once these functions are implemented separately, then it should be possible to combine them, and add an osd-ldiskfs.*.resetoi=N parameter can trigger "rename oi.16.N to oi.16.N.bak and rebuild" transparently to the running system.

      Attachments

        Issue Links

          Activity

            [LU-15016] OI Scrub backup and rebuild
            adilger Andreas Dilger added a comment - - edited

            "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45071
            Subject: LU-15016 osd: fix corrupted OI file online
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c0b2d11c325e042f724447ee45bc1ca1d2ff5379

            adilger Andreas Dilger added a comment - - edited "Hongchao Zhang <hongchao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45071 Subject: LU-15016 osd: fix corrupted OI file online Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c0b2d11c325e042f724447ee45bc1ca1d2ff5379

            People

              hongchao.zhang Hongchao Zhang
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: