Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14659

sanity test_413a: subdirs shouldn't be evenly distributed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.15.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Vitaly Fertman <vitaly_fertman@xyratex.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/9b91c699-3875-4fd0-a21b-f78bb5335464

      test_413a failed with the following error:

      subdirs shouldn't be evenly distributed
      

      it seems LU-12495 appeared again

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_413a - subdirs shouldn't be evenly distributed

      Attachments

        Issue Links

          Activity

            [LU-14659] sanity test_413a: subdirs shouldn't be evenly distributed
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44649/
            Subject: LU-14659 test: improve generate_uneven_mdts() in sanity.sh
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: d45be79a069f527657c1ce91630183031ea42b27

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44649/ Subject: LU-14659 test: improve generate_uneven_mdts() in sanity.sh Project: fs/lustre-release Branch: master Current Patch Set: Commit: d45be79a069f527657c1ce91630183031ea42b27

            "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44649
            Subject: LU-14659 test: improve generate_uneven_mdts() in sanity.sh
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 074354d0a3282a72957eef178338d1e8cd14af7b

            gerrit Gerrit Updater added a comment - "Lai Siyao <lai.siyao@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44649 Subject: LU-14659 test: improve generate_uneven_mdts() in sanity.sh Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 074354d0a3282a72957eef178338d1e8cd14af7b
            laisiyao Lai Siyao added a comment -

            Mmm, this can be improved. And the reason the "imbalance check" looks to be minimum imbalance is the most empty MDT is checked before IO, due to the cached statfs result and randomness in MDT choosing, it may not be the MDT most directories are created. It should be changed to check after creation.

            laisiyao Lai Siyao added a comment - Mmm, this can be improved. And the reason the "imbalance check" looks to be minimum imbalance is the most empty MDT is checked before IO, due to the cached statfs result and randomness in MDT choosing, it may not be the MDT most directories are created. It should be changed to check after creation.

            Lai, it looks like this is still being hit on master. Looking at a recent failure, I think there are two issues:
            https://testing.whamcloud.com/test_sets/3d05b60b-1939-4440-859d-55b5a2bec858

            Check for uneven MDTs: 
            weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 0
            weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 1
            weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 2
            MDT filesfree available: 834976 834538 835384 835497
            MDT blocks available: 416644 1045656 1044312 1042392
            weight diff=150%
            

            It looks like statfs is being cached inside the loop, so the loop is running more times than needed. This is probably also causing LU-14898 (divide by zero error) because the loop doesn't stop until the MDT has no free space left. Also, since inodes are weighted higher than space, maybe we need to re-add some amount of regular file creation into the loop (maybe creating only 64KB DoM files instead of 1MB) so the MDTs are more imbalanced by inodes also?

            The second issue is that the "imbalance check" appears to be using the minimum imbalance instead of the maximum imbalance?

            179 directories created on MDT0
            188 directories created on MDT1
            222 directories created on MDT2
            211 directories created on MDT3
             sanity test_413a: @@@@@@ FAIL: subdirs shouldn't be evenly distributed: 188 - 179 < 20
            

            The 189 directories on MDT1 is only higher by 9, but the other MDTs are higger by 43 and 32. Since the MDT selection is random, maybe this should compare the average difference > 20, and that no single MDT is less than 5 higher?

            adilger Andreas Dilger added a comment - Lai, it looks like this is still being hit on master. Looking at a recent failure, I think there are two issues: https://testing.whamcloud.com/test_sets/3d05b60b-1939-4440-859d-55b5a2bec858 Check for uneven MDTs: weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 0 weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 1 weight diff=0% must be > 100% ...Fill MDT0 with 100 files: loop 2 MDT filesfree available: 834976 834538 835384 835497 MDT blocks available: 416644 1045656 1044312 1042392 weight diff=150% It looks like statfs is being cached inside the loop, so the loop is running more times than needed. This is probably also causing LU-14898 (divide by zero error) because the loop doesn't stop until the MDT has no free space left. Also, since inodes are weighted higher than space, maybe we need to re-add some amount of regular file creation into the loop (maybe creating only 64KB DoM files instead of 1MB) so the MDTs are more imbalanced by inodes also? The second issue is that the "imbalance check" appears to be using the minimum imbalance instead of the maximum imbalance? 179 directories created on MDT0 188 directories created on MDT1 222 directories created on MDT2 211 directories created on MDT3 sanity test_413a: @@@@@@ FAIL: subdirs shouldn't be evenly distributed: 188 - 179 < 20 The 189 directories on MDT1 is only higher by 9, but the other MDTs are higger by 43 and 32. Since the MDT selection is random, maybe this should compare the average difference > 20, and that no single MDT is less than 5 higher?

            This should be fixed by patch https://review.whamcloud.com/43997 "LU-14762 lmv: compare space to mkdir on parent MDT"

            adilger Andreas Dilger added a comment - This should be fixed by patch https://review.whamcloud.com/43997 " LU-14762 lmv: compare space to mkdir on parent MDT "
            hornc Chris Horn added a comment - +1 on master https://testing.whamcloud.com/test_sets/857c693d-64af-48d6-a029-b641395d9b1a

            Vitaly,
            the test failure you linked here is on an early version of my patch https://review.whamcloud.com/#/c/43445/3 "LU-13439 lmv: qos stay on current MDT if less full" that is modifying the DNE MDT selection algorithm and had a bug in it. However, that bug was fixed and the more recent versions of that patch have been passing test_413[abc], so it isn't clear why you filed this issue?

            adilger Andreas Dilger added a comment - Vitaly, the test failure you linked here is on an early version of my patch https://review.whamcloud.com/#/c/43445/3 " LU-13439 lmv: qos stay on current MDT if less full " that is modifying the DNE MDT selection algorithm and had a bug in it. However, that bug was fixed and the more recent versions of that patch have been passing test_413 [abc] , so it isn't clear why you filed this issue?

            People

              laisiyao Lai Siyao
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: