[LU-1866] LFSCK Phase 1.5 for FID-in-dirent and linkEA consistency Created: 08/Sep/12  Updated: 18/Jun/14  Resolved: 20/Jul/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: New Feature Priority: Critical
Reporter: nasf (Inactive) Assignee: nasf (Inactive)
Resolution: Fixed Votes: 0
Labels: LFSCK

Attachments: PDF File LFSCK15_Demonstration_Milestone_Completion_r2.pdf     PDF File LFSCK15_Implementation_Milestone_Completion.pdf     Text File sanity-lfsck_20121229.log    
Issue Links:
Related
is related to LU-591 1.8->2.0 Lustre filesystem upgrade tool Resolved
is related to LU-2646 add special flag in the lma of the ag... Resolved
is related to LUDOC-85 LFSCK Phase 1.5 Doc Changes Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
LU-2742 add OBD_COMPAT_NOIGIF to block client... Technical task Resolved nasf  
Story Points: 55
Rank (Obsolete): 6640

 Description   

In Lustre-2.x, when create a file, its FID (File IDentifier) is stored as part of the name entry in the parent directory, which is called FID-in-dirent. With the FID-in-dirent, readdir on the MDT can fetch the FID from the directory page directly instead of having to get it from the object LMA (Lustre Metadata Attributes) extended attribute stored on the inode. As a result, traversing the directory (such as "ls") with FID-in-dirent is much faster than having to access the FID from the LMA. Also at file creation time, the FID of the parent directory and the name of the file are stored in the linkEA extended attribute on the inode. With the linkEA, any given FID can be parsed back to a full path from the root directory to the target file. It is useful for those ChangeLog based applications, like "lustre_rsync" and when generating error messages or POSIX style pathname permission checks. Hard links to a regular file also create the same FID-in-dirent and linkEA attributes to be stored.

Over the lifetime of an active filesystem, some FID-in-dirent and linkEA may become inconsistent or invalid as the result of on-disk corruption, after restoring from MDT file-level backup, or if the MDT filesystem was originally formatted under Lustre 1.8. Currently, if the MDT is upgraded from Lustre 1.8 or after the MDT is restored from a file-level backup, the MDT will be missing all the FID-in-dirent data, which will reduce the performance of readdir(3) on the MDT. Additionally, for an MDT upgraded from Lustre 1.8 the linkEA is also unavailable and the 2.x "lctl fid2path" functionality will not be available for those files.

In LFSCK Phase 1.5 we will implement the functionality of verifying and rebuilding FID-in-dirent and linkEA under for the single-MDT case. It will do these additional operations while the MDT is iterating over the objects table for OI Scrub. It will check whether the FID-in-dirent name entry is consistent with the FID in the object LMA or not, repair it if unmatched or rebuild it if the FID-in-dirent is missing. It also verifies that the name entry is correctly referenced by the object linkEA and the object linkEA points back to the valid name entry. The unmatched or redundant object linkEA will be removed, and the missed object linkEA will be added. In the case of Lustre 1.8 inodes with IGIF FIDs after an upgrade, it will store the IGIF FID into the LMA xattr on the inode, then in the FID-in-dirent and linkEA as it would for any 2.x FID.

The LFSCK Phase III project was to handle the FID-in-dirent and linkEA verification. This included both local-MDT references and cross-MDT cases where the directory entry and the object are located on different MDTs. The LFSCK Phase 1.5 implementation of FID-in-dirent and linkEA consistency check/repair contains a significant part of the LFSCK Phase III work.

Currently, the DNE project is underway. To make the LFSCK project less dependent on DNE project, we prefer to split the LFSCK Phase III into two parts: for DNE cases and for non-DNE cases. The part for non-DNE cases will be processed in the LFSCK Phase 1.5, and the other part will be processed after the DNE project completed. Having the LFSCK Phase 1.5 work completed earlier also benefits sites upgrading from Lustre 1.8.



 Comments   
Comment by nasf (Inactive) [ 12/Dec/12 ]

First version implementation for LFSCK 1.5 without test cases yet:

http://review.whamcloud.com/#change,4807

Comment by Mikhail Pershin [ 20/Dec/12 ]

FYI, the 27z test failure with your patch:

sanity test_27z: @@@@@@ FAIL: O/0/d16/240: no filter_fid info

This is because you've added extra EA to all files, at least to the OST objects. Therefore filter_fid EA is not fit into inode body anymore and debugfs can't find it. I saw the same issue in LU-838 when added LMA to the all files. We've discussed a bit this with Andreas and Alex. I forward thread to you

I tend to think also that having filter_fid not in inode body is still OK, so maybe we need to check this EA in other way than debugfs?

Comment by nasf (Inactive) [ 21/Dec/12 ]

From a long review, it is quite possible to introduce more EA for the inode, so if we can find other suitable way to check the EAs, then it is better.

Comment by nasf (Inactive) [ 23/Dec/12 ]

This patch can pass most sanity test cases, both sanity-scrub.sh and sanity-lfsck.sh work well against this version:

http://review.whamcloud.com/#change,4807,set7

Comment by nasf (Inactive) [ 23/Dec/12 ]

LFSCK 1.5 functionality tests results.

Comment by nasf (Inactive) [ 24/Dec/12 ]

Except some known conf-sanity issues, all other sanity tests can run.

http://review.whamcloud.com/#change,4807,set8

Comment by nasf (Inactive) [ 24/Dec/12 ]

Pass tests on Maloo (set 8)

https://maloo.whamcloud.com/test_sessions/d9c100a0-4df6-11e2-9dc7-52540035b04c

Comment by Mikhail Pershin [ 25/Dec/12 ]

could you describe what was changed to pass tests? I see that not all files have dirdata now, right?

Comment by nasf (Inactive) [ 25/Dec/12 ]

Currently, IDIF objects have no FID-in-LMA, since they are not in OI files.

Comment by nasf (Inactive) [ 28/Dec/12 ]

This is the patch to be reviewed:

http://review.whamcloud.com/5046
http://review.whamcloud.com/4901
http://review.whamcloud.com/4902
http://review.whamcloud.com/4903
http://review.whamcloud.com/4904
http://review.whamcloud.com/4906
http://review.whamcloud.com/4907
http://review.whamcloud.com/4908
http://review.whamcloud.com/4909
http://review.whamcloud.com/4910
http://review.whamcloud.com/4911
http://review.whamcloud.com/4912
http://review.whamcloud.com/4913
http://review.whamcloud.com/4914

Comment by nasf (Inactive) [ 28/Dec/12 ]

LFSCK 1.5 functionality test results

Comment by nasf (Inactive) [ 24/Jan/13 ]

FLSCK 1.5 test plan (ldiskfs only)
****************

1. Correctness
----------------

1.1) sanity-lfsck on Maloo with commit message "Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-lfsck". All test cases should pass.

1.2) sanity-scrub on Maloo with commit message "Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-scrub". All test cases should pass.

1.3) normal acc-sm tests on Maloo. All test cases should pass except for some known master failures.

2. Performance
----------------

The file set to be tested should be generated with the following conditions:

A) Create single test root directroy.

B) Create N sub-directories under the test root directory.

C) Under each sub-directory, create 100K objects, include 1K multiple-linked objects, and 1K symlink objects, and 8K empty directories, the others are normal files.

2.1) lfsck against healthy 2.x MDT device for consistency routine check.

2.1.1) Create above test file set with Lustre-2.4.

2.1.2) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 100, 200, 400, 800, 1600

2.2) lfsck against 2.x MDT device which is restored from file-level backup.

2.2.1) Create above test file set with Lustre-2.4.

2.2.2) Perform MDT file-level backup/restore.

2.2.3) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 10, 20, 40, 80, 160

2.3) lfsck agaist the MDT device which is upgraded from 1.8.

2.3.1) Create above test file set with Lustre-1.8

2.3.2) Update the system to 2.4. Use "tunefs --dirdata" to enable FID-in-dirent on MDT.

2.3.3) Test the highest lfsck speeds (full speed, without other work load) under different file sets: N = 100, 200, 400, 800, 1600

3. Create performance impact by lfsck
----------------

Measure how much the routine lfsck will affect normal create performance. Generate test file set as described in section 2 with N = 400.

3.1) Run lfsck with full speed on the file set. At the same time, use C threads to create 10M files (mds-survey) in parallel. Each thread creates under its private directory, and create 10M / C files.

3.2) Measure the create performance with different lfsck speed limit. According to the 3.1) result, we can know the highest speed for lfsck with create work load, assume it is S. Then repeate the test with lfsck speed limit = (1/4)S, (1/2)S, (3/4)S.

4. Scale test
----------------

Run mdtest on Hyperion, the routine lfsck should run background repeatedly. We can inject some known failure stubs by set fail_loc on MDS, such as OBD_FAIL_FID_INDIR, OBD_FAIL_FID_INLMA, OBD_FAIL_LFSCK_LINKEA_MORE, OBD_FAIL_LFSCK_LINKEA_LESS, and so on, then the lfsck can repair something during the check. There should be no failures reported.

5. DNE support
----------------

LFSCK correctness verification under DNE mode, depends on LU-2646.

5.1) Setup the DNE environment with 2 MDSes.

5.2) Generate file set, include remote objects.

5.3) Run lfsck on each MDS in parallel to check whether there are failures.

6. Resource requirement.
----------------

6.1) Test 1 can be done locally and on Maloo.

6.2) Test 2 and 3 can be done on Toro with 1 fat node.

6.3) Test 4 needs to be tested on Hyperion. It is better if some guys can help to do that.

6.4) Test 5 can be done on Toro with 4 nodes.

Comment by Patrick Farrell (Inactive) [ 08/Apr/13 ]

nasf,
I'm trying to make sure I understand the current status of this, as we at Cray are looking at starting some testing of upgrades from 1.8.x to 2.4.

Do the current patches fully cover the intended functionality of Phase 1.5 and just need more testing? If not, what functionality is still missing?

Comment by Patrick Farrell (Inactive) [ 24/Apr/13 ]

What happens if LFSCK -t namespace is done and dirdata has not been enabled on the MDS?

Comment by nasf (Inactive) [ 25/Apr/13 ]

Current master + the patch http://review.whamcloud.com/#change,6078 can support upgrading from Lustre-1.8.x to Lustre-2.4 well.

For the "lfsck -t namespace" on lustre-1.8.x device, but without "dirdata", then directory structure will keep unchanged, other parts, such as IGIF-in-LMA, IGIF-in-OI, linkEA for IGIF will be generated as the case of with "dirdata".

Comment by nasf (Inactive) [ 20/Jul/13 ]

All the patches for LFSCK 1.5 have been landed to Lustre-2.5

Generated at Sat Feb 10 01:20:22 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.