[LU-14659] sanity test_413a: subdirs shouldn't be evenly distributed Created: 30/Apr/21  Updated: 21/Oct/23  Resolved: 17/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-14762 qos subdirectory creation stay on par... Resolved
Related
is related to LU-12831 sanity test_413b timed out Open
is related to LU-14824 sanity test_413a: timeout Resolved
is related to LU-14898 sanity test_413a: (max - min) * 100 /... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Vitaly Fertman <vitaly_fertman@xyratex.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/9b91c699-3875-4fd0-a21b-f78bb5335464

test_413a failed with the following error:

subdirs shouldn't be evenly distributed

it seems LU-12495 appeared again

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity test_413a - subdirs shouldn't be evenly distributed



 Comments   
Comment by Andreas Dilger [ 30/Apr/21 ]

Vitaly,
the test failure you linked here is on an early version of my patch https://review.whamcloud.com/#/c/43445/3 "LU-13439 lmv: qos stay on current MDT if less full" that is modifying the DNE MDT selection algorithm and had a bug in it. However, that bug was fixed and the more recent versions of that patch have been passing test_413[abc], so it isn't clear why you filed this issue?

Comment by Chris Horn [ 26/May/21 ]

+1 on master https://testing.whamcloud.com/test_sets/857c693d-64af-48d6-a029-b641395d9b1a

Comment by Andreas Dilger [ 06/Jul/21 ]

This should be fixed by patch https://review.whamcloud.com/43997 "LU-14762 lmv: compare space to mkdir on parent MDT"

Comment by Andreas Dilger [ 11/Aug/21 ]

Lai, it looks like this is still being hit on master. Looking at a recent failure, I think there are two issues:
https://testing.whamcloud.com/test_sets/3d05b60b-1939-4440-859d-55b5a2bec858

Check for uneven MDTs: 
weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 0
weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 1
weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 2
MDT filesfree available: 834976 834538 835384 835497
MDT blocks available: 416644 1045656 1044312 1042392
weight diff=150%

It looks like statfs is being cached inside the loop, so the loop is running more times than needed. This is probably also causing LU-14898 (divide by zero error) because the loop doesn't stop until the MDT has no free space left. Also, since inodes are weighted higher than space, maybe we need to re-add some amount of regular file creation into the loop (maybe creating only 64KB DoM files instead of 1MB) so the MDTs are more imbalanced by inodes also?

The second issue is that the "imbalance check" appears to be using the minimum imbalance instead of the maximum imbalance?

179 directories created on MDT0
188 directories created on MDT1
222 directories created on MDT2
211 directories created on MDT3
 sanity test_413a: @@@@@@ FAIL: subdirs shouldn't be evenly distributed: 188 - 179 < 20

The 189 directories on MDT1 is only higher by 9, but the other MDTs are higger by 43 and 32. Since the MDT selection is random, maybe this should compare the average difference > 20, and that no single MDT is less than 5 higher?

Comment by Lai Siyao [ 12/Aug/21 ]

Mmm, this can be improved. And the reason the "imbalance check" looks to be minimum imbalance is the most empty MDT is checked before IO, due to the cached statfs result and randomness in MDT choosing, it may not be the MDT most directories are created. It should be changed to check after creation.

Comment by Gerrit Updater [ 12/Aug/21 ]

"Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44649
Subject: LU-14659 test: improve generate_uneven_mdts() in sanity.sh
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 074354d0a3282a72957eef178338d1e8cd14af7b

Comment by Gerrit Updater [ 17/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44649/
Subject: LU-14659 test: improve generate_uneven_mdts() in sanity.sh
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d45be79a069f527657c1ce91630183031ea42b27

Comment by Peter Jones [ 17/Oct/21 ]

Landed for 2.15

Generated at Sat Feb 10 03:11:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.