[LU-8565] sanity test 255a fails with ‘Speedup with willread is less than X%, got Y%’ Created: 29/Aug/16  Updated: 09/Feb/17  Resolved: 08/Sep/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Major
Reporter: James Nunez (Inactive) Assignee: Li Xi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

autotest


Issue Links:
Duplicate
is duplicated by LU-8564 sanity test_255a: Error: 'Speedup wit... Resolved
Related
is related to LU-9069 ZFS sanity: FAIL: test_255a Speedup w... Open
is related to LU-4931 New feature of giving server/storage ... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test 255a fails with

sanity test_255a: @@@@@@ FAIL: Speedup with willread is less than 88%, got 7% 

where the percentage values differ for each failure.

From one of the test failures, we can see that using the WILLREAD advice does not help with read and, for this test, does not out perform the benefits of reading from cache

Iter 1/10: cache speedup: 98%
Iter 1/10: ladvise speedup: -7%

Iter 3/10: cache speedup: 131%
Iter 3/10: ladvise speedup: 8%

Iter 8/10: cache speedup: 108%
Iter 8/10: ladvise speedup: 0%

For this test, we are comparing the speed up gained with the WILLREAD advice versus (half) the benefit of reading from cache:

local lowest_speedup=$((average_cache / 2))
	[ $average_ladvise -gt $lowest_speedup ] ||
		error "Speedup with willread is less than $lowest_speedup%,"\
			"got $average_ladvise%"

So, the expected speed up of the willread advice to ladvise is not as great as expected.

The WILLREAD hint was added with patch http://review.whamcloud.com/#/c/12458

Logs for recent failures are at
https://testing.hpdd.intel.com/test_sets/3ed61acc-6d90-11e6-b058-5254006e85c2
https://testing.hpdd.intel.com/test_sets/5eaddd70-6bca-11e6-8a8c-5254006e85c2
https://testing.hpdd.intel.com/test_sets/f10db2be-69cf-11e6-9258-5254006e85c2
https://testing.hpdd.intel.com/test_sets/e2c962c6-696f-11e6-9258-5254006e85c2

Info required for matching: sanity 255a



 Comments   
Comment by Evan D. Chen (Inactive) [ 30/Aug/16 ]

Li Xi, can you help to take a look of this ticket?

Comment by Oleg Drokin [ 30/Aug/16 ]

So it's clear the test needs to be revisited.
Considering that we run our testing in VMs with other parallel workloads, and also out VMs are short on memory, the effects could be somewhat less pronounced and jumpy and the test needs to take thtat into account not to fail unnecessary and frustrating people?

Comment by Li Xi (Inactive) [ 07/Sep/16 ]

I agree that the test should be skipped right now. It seems the performance improvement of "ladvise willread" is not so high as expected in some environments.

Comment by Andreas Dilger [ 07/Sep/16 ]

Li Xi, can you please submit a patch to change test_255a to not fail if the performance isn't as expected when running in a VM, like test_248:

test_248() {
        local my_error=error

        :
        :
        # This test case is time sensitive and Maloo uses KVM to run autotest.
        # Therefore the complete time of I/O task is unreliable and depends on
        # the workload on the host machine when the task is running.
        local virt=$(running_in_vm)
        [ -n "$virt" ] && echo "running in VM '$virt', ignore error" &&
                  my_error="error_ignore env=$virt"
        :
        :
        # verify that fast read is 4 times faster for cache read
        [ $(bc <<< "4 * $t_fast < $t_slow") -eq 1 ] ||
                $my_error "fast read was not 4 times faster: $t_fast vs $t_slow"

This is causing failures about once a day on other patches.

Comment by Gerrit Updater [ 08/Sep/16 ]

Gu Zheng (gzheng@ddn.com) uploaded a new patch: http://review.whamcloud.com/22375
Subject: LU-8565 test: change sanity 255a to not fail for performance when running in VM
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 5df2d8671e0ebe74a47376bbe765b26d5607e1ee

Comment by Gerrit Updater [ 08/Sep/16 ]

Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/22375/
Subject: LU-8565 test: change sanity 255a to not fail for performance when running in VM
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 09b0ef0c85ad73f06e517632f1b4ffa3a7174c2b

Comment by Andreas Dilger [ 08/Sep/16 ]

Patch landed for 2.9.0. Other patches would need to rebase to get this fix, but it is also unlikely that they would hit the same failure twice if retesting. There may still be a few failures in the next few days before patches are updated to include this fix.

Comment by Gerrit Updater [ 16/Jan/17 ]

Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: https://review.whamcloud.com/24907
Subject: LU-8565 tests: improve output of test_255a
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 34fe68f93d9926e4261126b77988b4d9ff872839

Comment by Andreas Dilger [ 09/Feb/17 ]

LU-9069 is for test failures on real hardware.

Generated at Sat Feb 10 02:18:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.