Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13138

sanity test 101d fails with 'readahead 283s > no-readahead 134s 500M'

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • Lustre 2.13.0, Lustre 2.14.0, Lustre 2.12.4
    • Ubuntu or PPC clients
    • 3
    • 9223372036854775807

    Description

      sanity test_101d fails with 'readahead 283s > no-readahead 134s 500M'. The first time this test failed with this error was on 25 FEB 2019 while testing a patch for LU-11157 with the first failure at https://testing.whamcloud.com/test_sets/9377a2ee-3967-11e9-b88b-52540065bddc. These failures for this patch are a mix of RHEL and other client testing and look like they are different from the branch testing failures.

      The first time this test failed with this error in non-patch review testing was on 02 MARCH 2019 for Ubuntu 18.04 client testing at https://testing.whamcloud.com/test_sets/02946220-3e27-11e9-a256-52540065bddc. And again for Ubuntu 18.04 client testing on 05 MAY 2019 at https://testing.whamcloud.com/test_sets/9cbca45c-705e-11e9-bd0e-52540065bddc. Since that time, we’ve seen about 20 failures in branch testing for Ubuntu and PPC testing.

      There is one case where this test fails for RHEL clients in branch testing where the dd time output does not match what the test is reporting; see the suite_log for the failure at https://testing.whamcloud.com/test_sets/55424ea8-34d8-11ea-80b4-52540065bddc. The test reports that reading the 500M file with read ahead turned off took about 134 seconds and the dd output confirms this

      Disable read-ahead
      Reading the test file /mnt/lustre/f101d.sanity with read-ahead disabled
      500+0 records in
      500+0 records out
      524288000 bytes (524 MB) copied, 133.321 s, 3.9 MB/s
      Cancel LRU locks on lustre client to flush the client cache
      

      Next we turn on read ahead and read the same file. In this case, dd reports that the read took about 17 seconds, but the test reports that the read took 283 seconds

      Enable read-ahead with 40MB
      0
      Reading the test file /mnt/lustre/f101d.sanity with read-ahead enabled
      500+0 records in
      500+0 records out
      524288000 bytes (524 MB) copied, 16.516 s, 31.7 MB/s
      read-ahead disabled time read 134
      read-ahead enabled  time read 283
      

      Looking at do_and_time() in test-framework.sh, we see

      6834 do_and_time () {
      6835     local cmd=$1
      6836     local rc
      6837 
      6838     SECONDS=0
      6839     eval '$cmd'
      6840 
      6841     [ ${PIPESTATUS[0]} -eq 0 ] || rc=1
      6842 
      6843     echo $SECONDS
      6844     return $rc
      6845 }
      

      The way this function is used, the return code of do_and_time() is saved as the time dd takes. Although I don’t think this is the issue, an improvement would be for do_and_time() to only echo/return one value.

      Attachments

        Issue Links

          Activity

            [LU-13138] sanity test 101d fails with 'readahead 283s > no-readahead 134s 500M'

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53188/
            Subject: LU-13138 tests: improve sanity/101d 'dd' parsing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 2d9b57dfc45d4ee247363f89e69021827916edee

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/53188/ Subject: LU-13138 tests: improve sanity/101d 'dd' parsing Project: fs/lustre-release Branch: master Current Patch Set: Commit: 2d9b57dfc45d4ee247363f89e69021827916edee

            "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53188
            Subject: LU-13138 tests: improve sanity/101d 'dd' parsing
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b32d4a2900e3b7ca7366f8b0f4583e4440152612

            gerrit Gerrit Updater added a comment - "Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/53188 Subject: LU-13138 tests: improve sanity/101d 'dd' parsing Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b32d4a2900e3b7ca7366f8b0f4583e4440152612

            My testing on Ubuntu20.04 HWE shows double the performance when read ahead is enabled I see the same behaviour for RHEL7 as well. Thanks to the work of Wang Shilong. We can close this ticket but it can be reopened if someone reports this problem again.

            simmonsja James A Simmons added a comment - My testing on Ubuntu20.04 HWE shows double the performance when read ahead is enabled I see the same behaviour for RHEL7 as well. Thanks to the work of Wang Shilong. We can close this ticket but it can be reopened if someone reports this problem again.
            qian_wc Qian Yingjin added a comment - +1 on ES6 https://testing.whamcloud.com/test_sets/52982bf3-3774-4a63-a141-0ed0bfdb86e2
            lixi_wc Li Xi added a comment - +1 on ES5.2 https://testing.whamcloud.com/test_sets/59e0506b-8004-473f-aa44-753b27cc82f5
            jamesanunez James Nunez (Inactive) added a comment - RHEL8.3 server/client failure at https://testing.whamcloud.com/test_sets/0af23296-c754-4951-8e2e-40383febe41b
            adilger Andreas Dilger added a comment - +1 on Ubuntu https://testing.whamcloud.com/test_sets/93206725-9b65-4264-b6ae-87ac957f5c20
            adilger Andreas Dilger added a comment - +1 on Ubuntu: https://testing.whamcloud.com/test_sets/9e3458e2-41d5-49f5-b7ee-277b5132492d
            ys Yang Sheng added a comment - Other instance: https://testing.whamcloud.com/test_sessions/43fe4d5d-90df-4962-985e-604656b4bb1b

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38427/
            Subject: LU-13138 tests: measure 'dd' time more accurately
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 43ebfad490a02613ada66c9ac79e80d1d9feaac5

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38427/ Subject: LU-13138 tests: measure 'dd' time more accurately Project: fs/lustre-release Branch: master Current Patch Set: Commit: 43ebfad490a02613ada66c9ac79e80d1d9feaac5

            People

              adilger Andreas Dilger
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: