Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8565

sanity test 255a fails with ‘Speedup with willread is less than X%, got Y%’

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.9.0
    • None
    • autotest
    • 3
    • 9223372036854775807

    Description

      sanity test 255a fails with

      sanity test_255a: @@@@@@ FAIL: Speedup with willread is less than 88%, got 7% 
      

      where the percentage values differ for each failure.

      From one of the test failures, we can see that using the WILLREAD advice does not help with read and, for this test, does not out perform the benefits of reading from cache

      Iter 1/10: cache speedup: 98%
      Iter 1/10: ladvise speedup: -7%
      
      Iter 3/10: cache speedup: 131%
      Iter 3/10: ladvise speedup: 8%
      
      Iter 8/10: cache speedup: 108%
      Iter 8/10: ladvise speedup: 0%
      

      For this test, we are comparing the speed up gained with the WILLREAD advice versus (half) the benefit of reading from cache:

      local lowest_speedup=$((average_cache / 2))
      	[ $average_ladvise -gt $lowest_speedup ] ||
      		error "Speedup with willread is less than $lowest_speedup%,"\
      			"got $average_ladvise%"
      

      So, the expected speed up of the willread advice to ladvise is not as great as expected.

      The WILLREAD hint was added with patch http://review.whamcloud.com/#/c/12458

      Logs for recent failures are at
      https://testing.hpdd.intel.com/test_sets/3ed61acc-6d90-11e6-b058-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/5eaddd70-6bca-11e6-8a8c-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/f10db2be-69cf-11e6-9258-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/e2c962c6-696f-11e6-9258-5254006e85c2

      Info required for matching: sanity 255a

      Attachments

        Issue Links

          Activity

            [LU-8565] sanity test 255a fails with ‘Speedup with willread is less than X%, got Y%’

            LU-9069 is for test failures on real hardware.

            adilger Andreas Dilger added a comment - LU-9069 is for test failures on real hardware.

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/24907
            Subject: LU-8565 tests: improve output of test_255a
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 34fe68f93d9926e4261126b77988b4d9ff872839

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/24907 Subject: LU-8565 tests: improve output of test_255a Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 34fe68f93d9926e4261126b77988b4d9ff872839

            Patch landed for 2.9.0. Other patches would need to rebase to get this fix, but it is also unlikely that they would hit the same failure twice if retesting. There may still be a few failures in the next few days before patches are updated to include this fix.

            adilger Andreas Dilger added a comment - Patch landed for 2.9.0. Other patches would need to rebase to get this fix, but it is also unlikely that they would hit the same failure twice if retesting. There may still be a few failures in the next few days before patches are updated to include this fix.

            Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/22375/
            Subject: LU-8565 test: change sanity 255a to not fail for performance when running in VM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 09b0ef0c85ad73f06e517632f1b4ffa3a7174c2b

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/22375/ Subject: LU-8565 test: change sanity 255a to not fail for performance when running in VM Project: fs/lustre-release Branch: master Current Patch Set: Commit: 09b0ef0c85ad73f06e517632f1b4ffa3a7174c2b

            Gu Zheng (gzheng@ddn.com) uploaded a new patch: http://review.whamcloud.com/22375
            Subject: LU-8565 test: change sanity 255a to not fail for performance when running in VM
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5df2d8671e0ebe74a47376bbe765b26d5607e1ee

            gerrit Gerrit Updater added a comment - Gu Zheng (gzheng@ddn.com) uploaded a new patch: http://review.whamcloud.com/22375 Subject: LU-8565 test: change sanity 255a to not fail for performance when running in VM Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5df2d8671e0ebe74a47376bbe765b26d5607e1ee

            Li Xi, can you please submit a patch to change test_255a to not fail if the performance isn't as expected when running in a VM, like test_248:

            test_248() {
                    local my_error=error
            
                    :
                    :
                    # This test case is time sensitive and Maloo uses KVM to run autotest.
                    # Therefore the complete time of I/O task is unreliable and depends on
                    # the workload on the host machine when the task is running.
                    local virt=$(running_in_vm)
                    [ -n "$virt" ] && echo "running in VM '$virt', ignore error" &&
                              my_error="error_ignore env=$virt"
                    :
                    :
                    # verify that fast read is 4 times faster for cache read
                    [ $(bc <<< "4 * $t_fast < $t_slow") -eq 1 ] ||
                            $my_error "fast read was not 4 times faster: $t_fast vs $t_slow"
            

            This is causing failures about once a day on other patches.

            adilger Andreas Dilger added a comment - Li Xi, can you please submit a patch to change test_255a to not fail if the performance isn't as expected when running in a VM, like test_248: test_248() { local my_error=error : : # This test case is time sensitive and Maloo uses KVM to run autotest. # Therefore the complete time of I/O task is unreliable and depends on # the workload on the host machine when the task is running. local virt=$(running_in_vm) [ -n "$virt" ] && echo "running in VM '$virt' , ignore error" && my_error= "error_ignore env=$virt" : : # verify that fast read is 4 times faster for cache read [ $(bc <<< "4 * $t_fast < $t_slow" ) -eq 1 ] || $my_error "fast read was not 4 times faster: $t_fast vs $t_slow" This is causing failures about once a day on other patches.

            I agree that the test should be skipped right now. It seems the performance improvement of "ladvise willread" is not so high as expected in some environments.

            lixi Li Xi (Inactive) added a comment - I agree that the test should be skipped right now. It seems the performance improvement of "ladvise willread" is not so high as expected in some environments.
            green Oleg Drokin added a comment -

            So it's clear the test needs to be revisited.
            Considering that we run our testing in VMs with other parallel workloads, and also out VMs are short on memory, the effects could be somewhat less pronounced and jumpy and the test needs to take thtat into account not to fail unnecessary and frustrating people?

            green Oleg Drokin added a comment - So it's clear the test needs to be revisited. Considering that we run our testing in VMs with other parallel workloads, and also out VMs are short on memory, the effects could be somewhat less pronounced and jumpy and the test needs to take thtat into account not to fail unnecessary and frustrating people?

            Li Xi, can you help to take a look of this ticket?

            cheneva1 Evan D. Chen (Inactive) added a comment - Li Xi, can you help to take a look of this ticket?

            People

              lixi Li Xi (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: