Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10349

debug and cleanup of corrupted PFID, unmatched MDT-object and OST-object pairs

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • Lustre 2.7.0, Lustre 2.9.0
    • None
    • servers: lustre-2.7.3-1nasS_mofed33v3g_2.6.32_642.15.1.el6.20170609.x86_64.lustre273.x86_64
      clients: lustre-client-2.9.0-2.3nasC_mofed34v1_4.4.74_92.32.1.20170808_nasa.x86_64
    • 3
    • 9223372036854775807

    Description

      This ticket is created to handle NASA-specific debugging of corrupted PFIDs discussed in LU-10248, as well as ensure ports of a fix for lfsck handling of repaired_unmatched_pair, and any other related questions to ensure proper running of lfsck and cleanup of the filesystem.

      servers: lustre-2.7.3-1nasS_mofed33v3g_2.6.32_642.15.1.el6.20170609.x86_64.lustre273.x86_64 (basically the old FE branch, plus several cherry-picked patches)

      clients: lustre-client-2.9.0-2.3nasC_mofed34v1_4.4.74_92.32.1.20170808_nasa.x86_64

      We hope to upgrade both to a 2.10.2-based build in the near future.

      Attachments

        Issue Links

          Activity

            [LU-10349] debug and cleanup of corrupted PFID, unmatched MDT-object and OST-object pairs
            pjones Peter Jones added a comment -

            AFAICT this is now resolved with the LU-10422 fix landed to b2_10

            pjones Peter Jones added a comment - AFAICT this is now resolved with the LU-10422 fix landed to b2_10
            pjones Peter Jones added a comment -

            Jay

            That is being tracked under LU-10422 and it will land as soon as the reviews have completed

            Peter

            pjones Peter Jones added a comment - Jay That is being tracked under LU-10422 and it will land as soon as the reviews have completed Peter

            The above patch #30612 for b2_10 is at
            https://review.whamcloud.com/#/c/30628/1
            Could you land this?

            jaylan Jay Lan (Inactive) added a comment - The above patch #30612 for b2_10 is at https://review.whamcloud.com/#/c/30628/1 Could you land this?

            ndauchy,

            The patch https://review.whamcloud.com/#/c/30612/ for fixing the issue of unexpected inconsistent owner has already been landed to master. And it has been ported to b2_7_fe branch via the patch https://review.whamcloud.com/30613. You can use related patch to resolve your system trouble. Please let me you what else you need.

            yong.fan nasf (Inactive) added a comment - ndauchy , The patch https://review.whamcloud.com/#/c/30612/ for fixing the issue of unexpected inconsistent owner has already been landed to master. And it has been ported to b2_7_fe branch via the patch https://review.whamcloud.com/30613 . You can use related patch to resolve your system trouble. Please let me you what else you need.

            The patch for repair unexpected inconsistent owner on b2_7_fe:
            https://review.whamcloud.com/30613

            yong.fan nasf (Inactive) added a comment - The patch for repair unexpected inconsistent owner on b2_7_fe: https://review.whamcloud.com/30613

            So I would suggest to keep the system unchanged since it is without influence now.

            OK, we will proceed with removing the additional OSTs from the file system, and wait for install of the patched lfsck on the 2.7.3 servers and perform the cleanup later.  Thanks!

            ndauchy Nathan Dauchy (Inactive) added a comment - So I would suggest to keep the system unchanged since it is without influence now. OK, we will proceed with removing the additional OSTs from the file system, and wait for install of the patched lfsck on the 2.7.3 servers and perform the cleanup later.  Thanks!

            Regardless, it will be a while before we can complete the server rebuild and take a downtime to apply it. In the meantime I need to move forward with removing additional OSTs from the file system to free up the hardware for spares. Do you recommend we run the current lfsck in non-dry-run mode, or just go ahead and remove the OST now and wait for the lfsck updates?

            There are two kinds inconsistency reported by the layout LFSCK, one is inconsistent owner information that cause by some known layout LFSCK issue, can be ignored. Another is the unmatched MDT-object and OST-object pairs. Such inconsistency will NOT affect normal system access unless enabling I/O verification (disable by default) explicitly. So I would suggest to keep the system unchanged since it is without influence now.

            Review #16135 is in 'Need Code-Review' state for more than 1 year. Is it OK to cherry-pick as it is, Fan?

            Yes, I think so. Such patch has already been landed to b2_8_fe and master. It is not on b2_7_fe may because b2_7 was some old at that time and such issue is not very serious.

            yong.fan nasf (Inactive) added a comment - Regardless, it will be a while before we can complete the server rebuild and take a downtime to apply it. In the meantime I need to move forward with removing additional OSTs from the file system to free up the hardware for spares. Do you recommend we run the current lfsck in non-dry-run mode, or just go ahead and remove the OST now and wait for the lfsck updates? There are two kinds inconsistency reported by the layout LFSCK, one is inconsistent owner information that cause by some known layout LFSCK issue, can be ignored. Another is the unmatched MDT-object and OST-object pairs. Such inconsistency will NOT affect normal system access unless enabling I/O verification (disable by default) explicitly. So I would suggest to keep the system unchanged since it is without influence now. Review #16135 is in 'Need Code-Review' state for more than 1 year. Is it OK to cherry-pick as it is, Fan? Yes, I think so. Such patch has already been landed to b2_8_fe and master. It is not on b2_7_fe may because b2_7 was some old at that time and such issue is not very serious.

            We do not have either.

            Review #16135 is in 'Need Code-Review' state for more than 1 year. Is it OK to cherry-pick as it is, Fan?

            jaylan Jay Lan (Inactive) added a comment - We do not have either. Review #16135 is in 'Need Code-Review' state for more than 1 year. Is it OK to cherry-pick as it is, Fan?

            People

              yong.fan nasf (Inactive)
              ndauchy Nathan Dauchy (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: