Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16198

sanity test_33hh: MDT index match 49/250 times

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for S Buisson <sbuisson@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/4a579469-93b2-4c09-9114-b0f1258c2fc9

      test_33hh failed with the following error:

      MDT index match 49/250 times
      

      Test log is:

      == sanity test 33hh: temp file is located on the same MDT as target (crush2) ========================================================== 16:36:28 (1664469388)
      MDS1_VERSION=34550793 version_code=34537472
      striped dir -i1 -c4 -H crush2 /mnt/lustre/d33hh.sanity
      pattern .f33hh.sanity.XXXXXX
      /mnt/lustre/d33hh.sanity/.f33hh.sanity.NPJRGZ MDT index mismatch 0 != 2
      pattern f33hh.sanity.XXXXXXXX
      1/250 MDT index mismatches, expect ~2-4
      pattern .f33hh.sanity.XXXXXX
      pattern f33hh.sanity.XXXXXXXX
      52/250 matches, expect ~62 for crush2
      pattern=.f33hh.sanity....XXX
      pattern=f33hh.sanity....XXXXX
      49/250 matches, expect ~62 for crush2
       sanity test_33hh: @@@@@@ FAIL: MDT index match 49/250 times
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_33hh - MDT index match 49/250 times

      Attachments

        Issue Links

          Activity

            [LU-16198] sanity test_33hh: MDT index match 49/250 times

            Landed for 2.16

            adilger Andreas Dilger added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48713/
            Subject: LU-16198 tests: increase margin for sanity/33hh
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e17471792388e59f44040d48dd8138ec865663af

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/48713/ Subject: LU-16198 tests: increase margin for sanity/33hh Project: fs/lustre-release Branch: master Current Patch Set: Commit: e17471792388e59f44040d48dd8138ec865663af

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48713
            Subject: LU-16198 tests: increase margin for sanity/33hh
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0efdd531c2896812c430e0ca623ef67fb2002ca1

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/48713 Subject: LU-16198 tests: increase margin for sanity/33hh Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0efdd531c2896812c430e0ca623ef67fb2002ca1

            This test failure is mostly caused by random chance based on the filenames created by "mktemp()" being all-number or all-uppercase or all-lowercase letters. Statistically this should be rare to have many of these, but the many times that this subtest is run (421 runs in the past week) means that the test will randomly fail occasionally (0.47% ).

            Options for fixing it would include:

            • increase the margin of error for allowing this to pass. Currently the threshold is 80% of the expected number of files per MDT (250 files / 4 MDTs = 62 * 4/5 = 49 files, or 62 * 5/4 = 77 files). In the past 3 months the subtest has failed 20 times, almost all of them are 46/250 or more. One failure is 45/250 and one is 80/250, so using 5/7=71%, so 62 * 5/7 = 44 files or 62 * 7/5 = 86 files should avoid virtually all random errors.
            • automatically re-running the subtest if it fails once would also reduce the chance of randomly reporting an error from 0.4% = 2/421 to 1/40000, or less than once every two years at the current rate of ~1500 runs per month.
            adilger Andreas Dilger added a comment - This test failure is mostly caused by random chance based on the filenames created by " mktemp() " being all-number or all-uppercase or all-lowercase letters. Statistically this should be rare to have many of these, but the many times that this subtest is run (421 runs in the past week) means that the test will randomly fail occasionally (0.47% ). Options for fixing it would include: increase the margin of error for allowing this to pass. Currently the threshold is 80% of the expected number of files per MDT ( 250 files / 4 MDTs = 62 * 4/5 = 49 files , or 62 * 5/4 = 77 files ). In the past 3 months the subtest has failed 20 times, almost all of them are 46/250 or more. One failure is 45/250 and one is 80/250, so using 5/7=71% , so 62 * 5/7 = 44 files or 62 * 7/5 = 86 files should avoid virtually all random errors. automatically re-running the subtest if it fails once would also reduce the chance of randomly reporting an error from 0.4% = 2/421 to 1/40000 , or less than once every two years at the current rate of ~1500 runs per month.
            qian_wc Qian Yingjin added a comment - +1 on master: https://testing.whamcloud.com/test_sets/3fe086d9-2c34-49c2-9413-42383acff1c8

            People

              adilger Andreas Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: