Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13138

sanity test 101d fails with 'readahead 283s > no-readahead 134s 500M'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.4
    • Ubuntu or PPC clients
    • 3
    • 9223372036854775807

    Description

      sanity test_101d fails with 'readahead 283s > no-readahead 134s 500M'. The first time this test failed with this error was on 25 FEB 2019 while testing a patch for LU-11157 with the first failure at https://testing.whamcloud.com/test_sets/9377a2ee-3967-11e9-b88b-52540065bddc. These failures for this patch are a mix of RHEL and other client testing and look like they are different from the branch testing failures.

      The first time this test failed with this error in non-patch review testing was on 02 MARCH 2019 for Ubuntu 18.04 client testing at https://testing.whamcloud.com/test_sets/02946220-3e27-11e9-a256-52540065bddc. And again for Ubuntu 18.04 client testing on 05 MAY 2019 at https://testing.whamcloud.com/test_sets/9cbca45c-705e-11e9-bd0e-52540065bddc. Since that time, we’ve seen about 20 failures in branch testing for Ubuntu and PPC testing.

      There is one case where this test fails for RHEL clients in branch testing where the dd time output does not match what the test is reporting; see the suite_log for the failure at https://testing.whamcloud.com/test_sets/55424ea8-34d8-11ea-80b4-52540065bddc. The test reports that reading the 500M file with read ahead turned off took about 134 seconds and the dd output confirms this

      Disable read-ahead
      Reading the test file /mnt/lustre/f101d.sanity with read-ahead disabled
      500+0 records in
      500+0 records out
      524288000 bytes (524 MB) copied, 133.321 s, 3.9 MB/s
      Cancel LRU locks on lustre client to flush the client cache
      

      Next we turn on read ahead and read the same file. In this case, dd reports that the read took about 17 seconds, but the test reports that the read took 283 seconds

      Enable read-ahead with 40MB
      0
      Reading the test file /mnt/lustre/f101d.sanity with read-ahead enabled
      500+0 records in
      500+0 records out
      524288000 bytes (524 MB) copied, 16.516 s, 31.7 MB/s
      read-ahead disabled time read 134
      read-ahead enabled  time read 283
      

      Looking at do_and_time() in test-framework.sh, we see

      6834 do_and_time () {
      6835     local cmd=$1
      6836     local rc
      6837 
      6838     SECONDS=0
      6839     eval '$cmd'
      6840 
      6841     [ ${PIPESTATUS[0]} -eq 0 ] || rc=1
      6842 
      6843     echo $SECONDS
      6844     return $rc
      6845 }
      

      The way this function is used, the return code of do_and_time() is saved as the time dd takes. Although I don’t think this is the issue, an improvement would be for do_and_time() to only echo/return one value.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: