Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17990

sanity test_33hh: FAIL: MDT index match 43/250 times

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for jianyu <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/43d05b9e-201f-4055-8216-f6ac8ae326ac

      test_33hh failed with the following error:

      == sanity test 33hh: temp file is located on the same MDT as target (crush2) ========================================================== 06:35:41 (1717742141)
      MDS1_VERSION=34553726 version_code=34537472
      striped dir -i1 -c4 -H crush2 /mnt/lustre/d33hh.sanity
      pattern .f33hh.sanity.XXXXXX
      /mnt/lustre/d33hh.sanity/.f33hh.sanity.oqpfrq MDT index mismatch 0 != 3
      /mnt/lustre/d33hh.sanity/.f33hh.sanity.baeiur MDT index mismatch 0 != 1
      /mnt/lustre/d33hh.sanity/.f33hh.sanity.paekox MDT index mismatch 0 != 2
      pattern f33hh.sanity.XXXXXXXX
      3/250 MDT index mismatches, expect ~2-4
      pattern .f33hh.sanity.XXXXXX
      pattern f33hh.sanity.XXXXXXXX
      52/250 matches, expect ~62 for crush2
      pattern=.f33hh.sanity....XXX
      pattern=f33hh.sanity....XXXXX
      43/250 matches, expect ~62 for crush2
       sanity test_33hh: @@@@@@ FAIL: MDT index match 43/250 times
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-master/4535 - 4.18.0-513.5.1.el8_9.x86_64
      servers: https://build.whamcloud.com/job/lustre-master/4535 - 4.18.0-513.18.1.el8_lustre.x86_64

      <<Please provide additional information about the failure here>>

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity test_33hh - MDT index match 43/250 times

      Attachments

        Issue Links

          Activity

            [LU-17990] sanity test_33hh: FAIL: MDT index match 43/250 times
            pjones Peter Jones added a comment -

            Merged for 2.16

            pjones Peter Jones added a comment - Merged for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55611/
            Subject: LU-17990 tests: sanity 33hh MDT index match often
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 243d272ed38fa4b3fd8f0cfb7aab62410628c36a

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/55611/ Subject: LU-17990 tests: sanity 33hh MDT index match often Project: fs/lustre-release Branch: master Current Patch Set: Commit: 243d272ed38fa4b3fd8f0cfb7aab62410628c36a

            "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55611
            Subject: LU-17990 tests: sanity 33hh MDT index match often
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: e5a81b57bbe1371f2fab24670b8e9eb89a85b67e

            gerrit Gerrit Updater added a comment - "Frederick Dilger <fdilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/55611 Subject: LU-17990 tests: sanity 33hh MDT index match often Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: e5a81b57bbe1371f2fab24670b8e9eb89a85b67e

            Something like:

            +       local tries=3
            +       for (( try = 0; try < tries; try++)); do
                            # crush2 doesn't put all-numeric suffixes on the same MDT,
                            # filename like $tfile.12345678 should *not* be considered temp
                            for pattern in ${patterns[*]}; do
                	            :
                            done
                            # the number of "bad" hashes is random, as it depends on the random
                            # filenames generated by "mktemp".  Allow some margin in the results.
                            echo "$((same/${#patterns[*]}))/$count matches, expect ~$expect for $1"
                            (( same / ${#patterns[*]} <= expect * 9 / 7 &&
                               same / ${#patterns[*]} > expect * 5 / 7 )) && break
            +              log "MDT index match $((same / ${#patterns[*]}))/$count times, try $try"
            +              (( try < tries )) ||
                                    error "MDT index match $((same / ${#patterns[*]}))/$count times after $try"
                            same=0
            +       done
                    :
            +       for (( try = 0; try < tries; try++)); do
                            # crush2 doesn't put suffixes with special characters on the same MDT
                            # filename like $tfile.txt.1234 should *not* be considered temp
                            for pattern in ${patterns[*]}; do
                                    :
                            done
                            # the number of "bad" hashes is random, as it depends on the random
                            # filenames generated by "mktemp".  Allow some margin in the results.
                            echo "$((same/${#patterns[*]}))/$count matches, expect ~$expect for $1"
                            (( same / ${#patterns[*]} <= expect * 9 / 7 &&
                               same / ${#patterns[*]} > expect * 5 / 7 )) && break
            +              log "MDT index match $((same / ${#patterns[*]}))/$count times, try $try"
            +              (( try < tries )) ||
                                    error "MDT index match $((same / ${#patterns[*]}))/$count times"
                            same=0
            +       done
            
            adilger Andreas Dilger added a comment - Something like: + local tries=3 + for (( try = 0; try < tries; try ++)); do # crush2 doesn't put all-numeric suffixes on the same MDT, # filename like $tfile.12345678 should *not* be considered temp for pattern in ${patterns[*]}; do : done # the number of "bad" hashes is random, as it depends on the random # filenames generated by "mktemp" . Allow some margin in the results. echo "$((same/${#patterns[*]}))/$count matches, expect ~$expect for $1" (( same / ${#patterns[*]} <= expect * 9 / 7 && same / ${#patterns[*]} > expect * 5 / 7 )) && break + log "MDT index match $((same / ${#patterns[*]}))/$count times, try $ try " + (( try < tries )) || error "MDT index match $((same / ${#patterns[*]}))/$count times after $ try " same=0 + done : + for (( try = 0; try < tries; try ++)); do # crush2 doesn't put suffixes with special characters on the same MDT # filename like $tfile.txt.1234 should *not* be considered temp for pattern in ${patterns[*]}; do : done # the number of "bad" hashes is random, as it depends on the random # filenames generated by "mktemp" . Allow some margin in the results. echo "$((same/${#patterns[*]}))/$count matches, expect ~$expect for $1" (( same / ${#patterns[*]} <= expect * 9 / 7 && same / ${#patterns[*]} > expect * 5 / 7 )) && break + log "MDT index match $((same / ${#patterns[*]}))/$count times, try $ try " + (( try < tries )) || error "MDT index match $((same / ${#patterns[*]}))/$count times" same=0 + done

            This is almost certainly the same as LU-16198, which is just a random chance that occasionally the generation of names will contain only numbers or only letters.

            We could probably fix this problem by restarting the test if it fails, maybe to a maximum of 3x internally. If it is still failing after 3x runs then there is something seriously wrong and it should still fail in that case.

            adilger Andreas Dilger added a comment - This is almost certainly the same as LU-16198 , which is just a random chance that occasionally the generation of names will contain only numbers or only letters. We could probably fix this problem by restarting the test if it fails, maybe to a maximum of 3x internally. If it is still failing after 3x runs then there is something seriously wrong and it should still fail in that case.
            yujian Jian Yu added a comment - +1 on master branch: https://testing.whamcloud.com/test_sets/259fca17-589d-482d-9889-b80ecbcc6aef

            People

              fdilger Fred Dilger
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: