Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.4
-
Ubuntu or PPC clients
-
3
-
9223372036854775807
Description
sanity test_101d fails with 'readahead 283s > no-readahead 134s 500M'. The first time this test failed with this error was on 25 FEB 2019 while testing a patch for LU-11157 with the first failure at https://testing.whamcloud.com/test_sets/9377a2ee-3967-11e9-b88b-52540065bddc. These failures for this patch are a mix of RHEL and other client testing and look like they are different from the branch testing failures.
The first time this test failed with this error in non-patch review testing was on 02 MARCH 2019 for Ubuntu 18.04 client testing at https://testing.whamcloud.com/test_sets/02946220-3e27-11e9-a256-52540065bddc. And again for Ubuntu 18.04 client testing on 05 MAY 2019 at https://testing.whamcloud.com/test_sets/9cbca45c-705e-11e9-bd0e-52540065bddc. Since that time, we’ve seen about 20 failures in branch testing for Ubuntu and PPC testing.
There is one case where this test fails for RHEL clients in branch testing where the dd time output does not match what the test is reporting; see the suite_log for the failure at https://testing.whamcloud.com/test_sets/55424ea8-34d8-11ea-80b4-52540065bddc. The test reports that reading the 500M file with read ahead turned off took about 134 seconds and the dd output confirms this
Disable read-ahead Reading the test file /mnt/lustre/f101d.sanity with read-ahead disabled 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 133.321 s, 3.9 MB/s Cancel LRU locks on lustre client to flush the client cache
Next we turn on read ahead and read the same file. In this case, dd reports that the read took about 17 seconds, but the test reports that the read took 283 seconds
Enable read-ahead with 40MB 0 Reading the test file /mnt/lustre/f101d.sanity with read-ahead enabled 500+0 records in 500+0 records out 524288000 bytes (524 MB) copied, 16.516 s, 31.7 MB/s read-ahead disabled time read 134 read-ahead enabled time read 283
Looking at do_and_time() in test-framework.sh, we see
6834 do_and_time () { 6835 local cmd=$1 6836 local rc 6837 6838 SECONDS=0 6839 eval '$cmd' 6840 6841 [ ${PIPESTATUS[0]} -eq 0 ] || rc=1 6842 6843 echo $SECONDS 6844 return $rc 6845 }
The way this function is used, the return code of do_and_time() is saved as the time dd takes. Although I don’t think this is the issue, an improvement would be for do_and_time() to only echo/return one value.