Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9069

ZFS sanity: FAIL: test_255a Speedup with willread is less than XX%, got YY%

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.0
    • onyx-64 thru 67
      Full Group test, RHEL7.3/ZFS
      master branch, b3499
    • 3
    • 9223372036854775807

    Description

      sanity: FAIL: test_255a Speedup with willread is less than XX%, got YY%

      Test is expecting willreads to be faster than cache reads, but willreads are slower:

      From test_log:

      Iter 1/10: cache speedup: 228%
      Iter 1/10: ladvise speedup: 24%
      
      Iter 2/10: cache speedup: 147%
      Iter 2/10: ladvise speedup: 20%
      
      Iter 3/10: cache speedup: 170%
      Iter 3/10: ladvise speedup: 10%
      

      Related to LU-8565, but that issue was attributed to VM slowness. This issue is also seen on physical machines.

      Info required for matching: sanity 255a

      Attachments

        Issue Links

          Activity

            [LU-9069] ZFS sanity: FAIL: test_255a Speedup with willread is less than XX%, got YY%

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24907/
            Subject: LU-9069 tests: improve output of sanity test_255a
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: bdd470ff9765cb81d94b888ee1067e1c420303bc

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24907/ Subject: LU-9069 tests: improve output of sanity test_255a Project: fs/lustre-release Branch: master Current Patch Set: Commit: bdd470ff9765cb81d94b888ee1067e1c420303bc

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/24907
            Subject: LU-9069 tests: improve output of sanity test_255a
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 5
            Commit: 99c7ca0110affd39c33afda073d09e6c1335ac81

            adilger Andreas Dilger added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/24907 Subject: LU-9069 tests: improve output of sanity test_255a Project: fs/lustre-release Branch: master Current Patch Set: 5 Commit: 99c7ca0110affd39c33afda073d09e6c1335ac81

            It turns out that all of these real-world test failures are only failing with ZFS backends. That implies some problem with ladvise not prefetching the pages properly for ZFS, since the cached reads run much faster but ladvise reads sometimes do not run faster with ZFS.

            Looking at the output more closely, it appears that the cases where ZFS is passing are only because the cached speedup is below the threshold:
            https://testing.hpdd.intel.com/sub_tests/64172ba6-f9df-11e6-8b38-5254006e85c2

            Iter 1/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 1/10: Reading again without willread hint
            Iter 1/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 1/10: cache speedup:  6.79256/6.53006=4%
            Iter 1/10: ladvise speedup:  6.59651/6.53006=1%
            Iter 2/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 2/10: Reading again without willread hint
            Iter 2/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 2/10: cache speedup:  7.24601/6.77954=6%
            Iter 2/10: ladvise speedup:  6.651/6.77954=-1%
            Iter 3/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 3/10: Reading again without willread hint
            Iter 3/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 3/10: cache speedup:  6.82836/6.4925=5%
            Iter 3/10: ladvise speedup:  6.39501/6.4925=-1%
            Iter 4/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 4/10: Reading again without willread hint
            Iter 4/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 4/10: cache speedup:  6.80656/6.68438=1%
            Iter 4/10: ladvise speedup:  6.52108/6.68438=-2%
            Iter 5/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 5/10: Reading again without willread hint
            Iter 5/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 5/10: cache speedup:  7.23196/6.44395=12%
            Iter 5/10: ladvise speedup:  6.67935/6.44395=3%
            Iter 6/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 6/10: Reading again without willread hint
            Iter 6/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 6/10: cache speedup:  6.90188/6.73113=2%
            Iter 6/10: ladvise speedup:  6.61056/6.73113=-1%
            Iter 7/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 7/10: Reading again without willread hint
            Iter 7/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 7/10: cache speedup:  6.93889/6.67545=3%
            Iter 7/10: ladvise speedup:  6.4859/6.67545=-2%
            Iter 8/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 8/10: Reading again without willread hint
            Iter 8/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 8/10: cache speedup:  7.03663/6.86692=2%
            Iter 8/10: ladvise speedup:  6.60757/6.86692=-3%
            Iter 9/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 9/10: Reading again without willread hint
            Iter 9/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 9/10: cache speedup:  7.18227/6.64533=8%
            Iter 9/10: ladvise speedup:  6.56245/6.64533=-1%
            Iter 10/10: reading without willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 10/10: Reading again without willread hint
            Iter 10/10: reading with willread hint
            CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches
            Iter 10/10: cache speedup:  6.96441/6.69985=3%
            Iter 10/10: ladvise speedup:  6.46659/6.69985=-3%
            Speedup with cache is less than 20% (4%), skipping check of speedup with willread:  -1%
            

            So it looks like the real problem is that "echo 3 > /proc/sys/vm/drop_caches" doesn't actually flush the cache for ZFS servers, so the "uncached speed" test is invalid.

            What is needed is to implement the "dontneed" ladvice for ZFS so that the test can use it to flush the server ARC cache for this file and run a valid test.

            adilger Andreas Dilger added a comment - It turns out that all of these real-world test failures are only failing with ZFS backends. That implies some problem with ladvise not prefetching the pages properly for ZFS, since the cached reads run much faster but ladvise reads sometimes do not run faster with ZFS. Looking at the output more closely, it appears that the cases where ZFS is passing are only because the cached speedup is below the threshold: https://testing.hpdd.intel.com/sub_tests/64172ba6-f9df-11e6-8b38-5254006e85c2 Iter 1/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 1/10: Reading again without willread hint Iter 1/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 1/10: cache speedup: 6.79256/6.53006=4% Iter 1/10: ladvise speedup: 6.59651/6.53006=1% Iter 2/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 2/10: Reading again without willread hint Iter 2/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 2/10: cache speedup: 7.24601/6.77954=6% Iter 2/10: ladvise speedup: 6.651/6.77954=-1% Iter 3/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 3/10: Reading again without willread hint Iter 3/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 3/10: cache speedup: 6.82836/6.4925=5% Iter 3/10: ladvise speedup: 6.39501/6.4925=-1% Iter 4/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 4/10: Reading again without willread hint Iter 4/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 4/10: cache speedup: 6.80656/6.68438=1% Iter 4/10: ladvise speedup: 6.52108/6.68438=-2% Iter 5/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 5/10: Reading again without willread hint Iter 5/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 5/10: cache speedup: 7.23196/6.44395=12% Iter 5/10: ladvise speedup: 6.67935/6.44395=3% Iter 6/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 6/10: Reading again without willread hint Iter 6/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 6/10: cache speedup: 6.90188/6.73113=2% Iter 6/10: ladvise speedup: 6.61056/6.73113=-1% Iter 7/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 7/10: Reading again without willread hint Iter 7/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 7/10: cache speedup: 6.93889/6.67545=3% Iter 7/10: ladvise speedup: 6.4859/6.67545=-2% Iter 8/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 8/10: Reading again without willread hint Iter 8/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 8/10: cache speedup: 7.03663/6.86692=2% Iter 8/10: ladvise speedup: 6.60757/6.86692=-3% Iter 9/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 9/10: Reading again without willread hint Iter 9/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 9/10: cache speedup: 7.18227/6.64533=8% Iter 9/10: ladvise speedup: 6.56245/6.64533=-1% Iter 10/10: reading without willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 10/10: Reading again without willread hint Iter 10/10: reading with willread hint CMD: trevis-34vm4 echo 3 > /proc/sys/vm/drop_caches Iter 10/10: cache speedup: 6.96441/6.69985=3% Iter 10/10: ladvise speedup: 6.46659/6.69985=-3% Speedup with cache is less than 20% (4%), skipping check of speedup with willread: -1% So it looks like the real problem is that "echo 3 > /proc/sys/vm/drop_caches" doesn't actually flush the cache for ZFS servers, so the "uncached speed" test is invalid. What is needed is to implement the "dontneed" ladvice for ZFS so that the test can use it to flush the server ARC cache for this file and run a valid test.

            People

              wc-triage WC Triage
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated: