FLSCK 2 test plan (ldiskfs only)
****************
1. Correctness
----------------
1.1) sanity-lfsck on Maloo with commit message "Test-Parameters: envdefinitions=ENABLE_QUOTA=yes mdtcount=2 testlist=sanity-lfsck". All test cases should pass.
1.2) sanity-scrub on Maloo with commit message "Test-Parameters: envdefinitions=ENABLE_QUOTA=yes testlist=sanity-scrub". All test cases should pass.
1.3) normal acc-sm tests on Maloo. All test cases should pass except for some known master failures.
2. Performance
----------------
The file set to be tested should be generated with the following conditions:
A) Create 'L' test root directories, 'L' is the MDTs count, for the test root dir-X, it locates on the MDT-X.
B) Set default stripe size as 64KB, and default stripe count as 'M'.
B) Create 'N' sub-directories under each test root directory.
C) Under each sub-directory, generate 100K normal files, each file contains 64 * 'M' KB data.
2.1) LFSCK against healthy 2.x system for consistency routine check.
2.1.1) Create above test file sets with Lustre-2.6.
2.1.2) Test the highest LFSCK speeds (full speed, without other work load) under different file sets: 'N' = 2, 4, 8, 16; and with different stripe counts: 'M' = 1, 2, 4; and with different MDTs count: 'L' = 1, 2, 4.
2.2) LFSCK against the lustre-2.x system with inconsistent layout OST-objects.
2.2.1) On the OSS, set fail_loc to skip the XATTR_NAME_FID set to simulate the case of MDT-OST inconsistency
2.2.2) Create above test file sets with Lustre-2.6.
2.2.3) Test the highest LFSCK speeds (full speed, without other work load) under different file sets: 'N' = 2, 4, 8, 16; and with different stripe counts: 'M' = 1, 2, 4; and with different MDTs count: 'L' = 1, 2, 4.
3. Small files create performance impact by LFSCK
----------------
Measure how much the routine LFSCK will affect normal small files create performance. Generate test file set as described in section 2 with N = 16, M = 4, L = 1.
3.1) Run LFSCK with full speed on the file set. At the same time, use 'C' threads to create 512K (or 256K files if the LFSCK run too fast) small files in parallel, each file is 64KB single striped. Each thread creates under its private directory, and create 512K / 'C' files.
3.2) Measure the create performance with different lfsck speed limit. According to the 3.1) result, we can know the highest speed for lfsck with small files create work load, assume it is 'S'. Then repeat the test with LFSCK speed limit = (1/4)'S', (1/2)'S', (3/4)'S'.
4. Scale test
----------------
Run LFSCK on more MDTs ('L' = 16) and OSTs ('M' = 16) for MDT-OST consistency verification.
4.1) To verify whether there will be correctness issues under such scale mode.
4.2 To verify whether the LFSCK mechanism is runnable under large scale mode, such as whether very slow or not.
5. Resource requirement.
----------------
5.1) Test 1 can be done locally and on Maloo.
5.2) Test 2/3 need at least 4 MDS nodes, 2 OSS nodes, and 1 client.
5.3) We can use the same hardware as test2/3 using, but it is better to use more real servers.
5.4) Each OSS node needs at least 1TB storage.
With recent patch landings to LFSCK, there are a few more options to choose from. Should we incorporate some of these into the existing test plan?
All of the new options need to be tested, but for for any of the existing tests, that revolve around performance, do we want to :
Create lost OST-objects (-c)?
Handle orphan objects (-o)?
What type should we run namespace, layout or both?
Since XATTR_NAME_FID does not exist, in test 2.2, is setting fail_loc to OBD_LFSCK_UNMATCHED_PAIR* or OBD_LFSCK_INVALID_PFID just as good or will repairing different failures cause dramatically different performance results? Is LFSCK_DANGLING preferred?