Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1866

LFSCK Phase 1.5 for FID-in-dirent and linkEA consistency

Details

    • New Feature
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0
    • Lustre 2.4.0
    • 55
    • 6640

    Description

      In Lustre-2.x, when create a file, its FID (File IDentifier) is stored as part of the name entry in the parent directory, which is called FID-in-dirent. With the FID-in-dirent, readdir on the MDT can fetch the FID from the directory page directly instead of having to get it from the object LMA (Lustre Metadata Attributes) extended attribute stored on the inode. As a result, traversing the directory (such as "ls") with FID-in-dirent is much faster than having to access the FID from the LMA. Also at file creation time, the FID of the parent directory and the name of the file are stored in the linkEA extended attribute on the inode. With the linkEA, any given FID can be parsed back to a full path from the root directory to the target file. It is useful for those ChangeLog based applications, like "lustre_rsync" and when generating error messages or POSIX style pathname permission checks. Hard links to a regular file also create the same FID-in-dirent and linkEA attributes to be stored.

      Over the lifetime of an active filesystem, some FID-in-dirent and linkEA may become inconsistent or invalid as the result of on-disk corruption, after restoring from MDT file-level backup, or if the MDT filesystem was originally formatted under Lustre 1.8. Currently, if the MDT is upgraded from Lustre 1.8 or after the MDT is restored from a file-level backup, the MDT will be missing all the FID-in-dirent data, which will reduce the performance of readdir(3) on the MDT. Additionally, for an MDT upgraded from Lustre 1.8 the linkEA is also unavailable and the 2.x "lctl fid2path" functionality will not be available for those files.

      In LFSCK Phase 1.5 we will implement the functionality of verifying and rebuilding FID-in-dirent and linkEA under for the single-MDT case. It will do these additional operations while the MDT is iterating over the objects table for OI Scrub. It will check whether the FID-in-dirent name entry is consistent with the FID in the object LMA or not, repair it if unmatched or rebuild it if the FID-in-dirent is missing. It also verifies that the name entry is correctly referenced by the object linkEA and the object linkEA points back to the valid name entry. The unmatched or redundant object linkEA will be removed, and the missed object linkEA will be added. In the case of Lustre 1.8 inodes with IGIF FIDs after an upgrade, it will store the IGIF FID into the LMA xattr on the inode, then in the FID-in-dirent and linkEA as it would for any 2.x FID.

      The LFSCK Phase III project was to handle the FID-in-dirent and linkEA verification. This included both local-MDT references and cross-MDT cases where the directory entry and the object are located on different MDTs. The LFSCK Phase 1.5 implementation of FID-in-dirent and linkEA consistency check/repair contains a significant part of the LFSCK Phase III work.

      Currently, the DNE project is underway. To make the LFSCK project less dependent on DNE project, we prefer to split the LFSCK Phase III into two parts: for DNE cases and for non-DNE cases. The part for non-DNE cases will be processed in the LFSCK Phase 1.5, and the other part will be processed after the DNE project completed. Having the LFSCK Phase 1.5 work completed earlier also benefits sites upgrading from Lustre 1.8.

      Attachments

        Issue Links

          Activity

            [LU-1866] LFSCK Phase 1.5 for FID-in-dirent and linkEA consistency

            All the patches for LFSCK 1.5 have been landed to Lustre-2.5

            yong.fan nasf (Inactive) added a comment - All the patches for LFSCK 1.5 have been landed to Lustre-2.5

            Current master + the patch http://review.whamcloud.com/#change,6078 can support upgrading from Lustre-1.8.x to Lustre-2.4 well.

            For the "lfsck -t namespace" on lustre-1.8.x device, but without "dirdata", then directory structure will keep unchanged, other parts, such as IGIF-in-LMA, IGIF-in-OI, linkEA for IGIF will be generated as the case of with "dirdata".

            yong.fan nasf (Inactive) added a comment - Current master + the patch http://review.whamcloud.com/#change,6078 can support upgrading from Lustre-1.8.x to Lustre-2.4 well. For the "lfsck -t namespace" on lustre-1.8.x device, but without "dirdata", then directory structure will keep unchanged, other parts, such as IGIF-in-LMA, IGIF-in-OI, linkEA for IGIF will be generated as the case of with "dirdata".

            What happens if LFSCK -t namespace is done and dirdata has not been enabled on the MDS?

            paf Patrick Farrell (Inactive) added a comment - What happens if LFSCK -t namespace is done and dirdata has not been enabled on the MDS?

            nasf,
            I'm trying to make sure I understand the current status of this, as we at Cray are looking at starting some testing of upgrades from 1.8.x to 2.4.

            Do the current patches fully cover the intended functionality of Phase 1.5 and just need more testing? If not, what functionality is still missing?

            paf Patrick Farrell (Inactive) added a comment - nasf, I'm trying to make sure I understand the current status of this, as we at Cray are looking at starting some testing of upgrades from 1.8.x to 2.4. Do the current patches fully cover the intended functionality of Phase 1.5 and just need more testing? If not, what functionality is still missing?
            yong.fan nasf (Inactive) added a comment - - edited

            FLSCK 1.5 test plan (ldiskfs only)
            ****************

            1. Correctness
            ----------------

            1.1) sanity-lfsck on Maloo with commit message "Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-lfsck". All test cases should pass.

            1.2) sanity-scrub on Maloo with commit message "Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-scrub". All test cases should pass.

            1.3) normal acc-sm tests on Maloo. All test cases should pass except for some known master failures.

            2. Performance
            ----------------

            The file set to be tested should be generated with the following conditions:

            A) Create single test root directroy.

            B) Create N sub-directories under the test root directory.

            C) Under each sub-directory, create 100K objects, include 1K multiple-linked objects, and 1K symlink objects, and 8K empty directories, the others are normal files.

            2.1) lfsck against healthy 2.x MDT device for consistency routine check.

            2.1.1) Create above test file set with Lustre-2.4.

            2.1.2) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 100, 200, 400, 800, 1600

            2.2) lfsck against 2.x MDT device which is restored from file-level backup.

            2.2.1) Create above test file set with Lustre-2.4.

            2.2.2) Perform MDT file-level backup/restore.

            2.2.3) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 10, 20, 40, 80, 160

            2.3) lfsck agaist the MDT device which is upgraded from 1.8.

            2.3.1) Create above test file set with Lustre-1.8

            2.3.2) Update the system to 2.4. Use "tunefs --dirdata" to enable FID-in-dirent on MDT.

            2.3.3) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 100, 200, 400, 800, 1600

            3. Create performance impact by lfsck
            ----------------

            Measure how much the routine lfsck will affect normal create performance. Generate test file set as described in section 2 with N = 400.

            3.1) Run lfsck with full speed on the file set. At the same time, use C threads to create 10M files (mds-survey) in parallel. Each thread creates under its private directory, and create 10M / C files.

            3.2) Measure the create performance with different lfsck speed limit. According to the 3.1) result, we can know the highest speed for lfsck with create work load, assume it is S. Then repeate the test with lfsck speed limit = (1/4)S, (1/2)S, (3/4)S.

            4. Scale test
            ----------------

            Run mdtest on Hyperion, the routine lfsck should run background repeatedly. We can inject some known failure stubs by set fail_loc on MDS, such as OBD_FAIL_FID_INDIR, OBD_FAIL_FID_INLMA, OBD_FAIL_LFSCK_LINKEA_MORE, OBD_FAIL_LFSCK_LINKEA_LESS, and so on, then the lfsck can repair something during the check. There should be no failures reported.

            5. DNE support
            ----------------

            LFSCK correctness verification under DNE mode, depends on LU-2646.

            5.1) Setup the DNE environment with 2 MDSes.

            5.2) Generate file set, include remote objects.

            5.3) Run lfsck on each MDS in parallel to check whether there are failures.

            6. Resource requirement.
            ----------------

            6.1) Test 1 can be done locally and on Maloo.

            6.2) Test 2 and 3 can be done on Toro with 1 fat node.

            6.3) Test 4 needs to be tested on Hyperion. It is better if some guys can help to do that.

            6.4) Test 5 can be done on Toro with 4 nodes.

            yong.fan nasf (Inactive) added a comment - - edited FLSCK 1.5 test plan (ldiskfs only) **************** 1. Correctness ---------------- 1.1) sanity-lfsck on Maloo with commit message "Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-lfsck". All test cases should pass. 1.2) sanity-scrub on Maloo with commit message "Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-scrub". All test cases should pass. 1.3) normal acc-sm tests on Maloo. All test cases should pass except for some known master failures. 2. Performance ---------------- The file set to be tested should be generated with the following conditions: A) Create single test root directroy. B) Create N sub-directories under the test root directory. C) Under each sub-directory, create 100K objects, include 1K multiple-linked objects, and 1K symlink objects, and 8K empty directories, the others are normal files. 2.1) lfsck against healthy 2.x MDT device for consistency routine check. 2.1.1) Create above test file set with Lustre-2.4. 2.1.2) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 100, 200, 400, 800, 1600 2.2) lfsck against 2.x MDT device which is restored from file-level backup. 2.2.1) Create above test file set with Lustre-2.4. 2.2.2) Perform MDT file-level backup/restore. 2.2.3) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 10, 20, 40, 80, 160 2.3) lfsck agaist the MDT device which is upgraded from 1.8. 2.3.1) Create above test file set with Lustre-1.8 2.3.2) Update the system to 2.4. Use "tunefs --dirdata" to enable FID-in-dirent on MDT. 2.3.3) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 100, 200, 400, 800, 1600 3. Create performance impact by lfsck ---------------- Measure how much the routine lfsck will affect normal create performance. Generate test file set as described in section 2 with N = 400. 3.1) Run lfsck with full speed on the file set. At the same time, use C threads to create 10M files (mds-survey) in parallel. Each thread creates under its private directory, and create 10M / C files. 3.2) Measure the create performance with different lfsck speed limit. According to the 3.1) result, we can know the highest speed for lfsck with create work load, assume it is S. Then repeate the test with lfsck speed limit = (1/4)S, (1/2)S, (3/4)S. 4. Scale test ---------------- Run mdtest on Hyperion, the routine lfsck should run background repeatedly. We can inject some known failure stubs by set fail_loc on MDS, such as OBD_FAIL_FID_INDIR, OBD_FAIL_FID_INLMA, OBD_FAIL_LFSCK_LINKEA_MORE, OBD_FAIL_LFSCK_LINKEA_LESS, and so on, then the lfsck can repair something during the check. There should be no failures reported. 5. DNE support ---------------- LFSCK correctness verification under DNE mode, depends on LU-2646 . 5.1) Setup the DNE environment with 2 MDSes. 5.2) Generate file set, include remote objects. 5.3) Run lfsck on each MDS in parallel to check whether there are failures. 6. Resource requirement. ---------------- 6.1) Test 1 can be done locally and on Maloo. 6.2) Test 2 and 3 can be done on Toro with 1 fat node. 6.3) Test 4 needs to be tested on Hyperion. It is better if some guys can help to do that. 6.4) Test 5 can be done on Toro with 4 nodes.

            LFSCK 1.5 functionality test results

            yong.fan nasf (Inactive) added a comment - LFSCK 1.5 functionality test results
            yong.fan nasf (Inactive) added a comment - - edited This is the patch to be reviewed: http://review.whamcloud.com/5046 http://review.whamcloud.com/4901 http://review.whamcloud.com/4902 http://review.whamcloud.com/4903 http://review.whamcloud.com/4904 http://review.whamcloud.com/4906 http://review.whamcloud.com/4907 http://review.whamcloud.com/4908 http://review.whamcloud.com/4909 http://review.whamcloud.com/4910 http://review.whamcloud.com/4911 http://review.whamcloud.com/4912 http://review.whamcloud.com/4913 http://review.whamcloud.com/4914
            yong.fan nasf (Inactive) added a comment - - edited

            Currently, IDIF objects have no FID-in-LMA, since they are not in OI files.

            yong.fan nasf (Inactive) added a comment - - edited Currently, IDIF objects have no FID-in-LMA, since they are not in OI files.

            could you describe what was changed to pass tests? I see that not all files have dirdata now, right?

            tappro Mikhail Pershin added a comment - could you describe what was changed to pass tests? I see that not all files have dirdata now, right?
            yong.fan nasf (Inactive) added a comment - Pass tests on Maloo (set 8) https://maloo.whamcloud.com/test_sessions/d9c100a0-4df6-11e2-9dc7-52540035b04c

            People

              yong.fan nasf (Inactive)
              yong.fan nasf (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: