Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8738

sanity test_255b: FAIL: Ladvise willread should use more memory than 76800 KiB

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      sanity test 255b failed as follows:

      == sanity test 255b: check 'lfs ladvise -a dontneed' ================================================= 17:21:04 (1476811264)
      100+0 records in
      100+0 records out
      104857600 bytes (105 MB) copied, 0.778124 s, 135 MB/s
      CMD: trevis-41vm4 cat /proc/meminfo | grep ^MemTotal:
      Total memory: 1923480 KiB
      CMD: trevis-41vm4 sync && echo 3 > /proc/sys/vm/drop_caches
      CMD: trevis-41vm4 cat /proc/meminfo | grep ^Cached:
      Cache used before read: 72592 KiB
      CMD: trevis-41vm4 cat /proc/meminfo | grep ^Cached:
      Cache used after read: 120972 KiB
      CMD: trevis-41vm4 cat /proc/meminfo | grep ^Cached:
      Cache used after dontneed ladvise: 18572 KiB
       sanity test_255b: @@@@@@ FAIL: Ladvise willread should use more memory than 76800 KiB
      

      Maloo reports:
      https://testing.hpdd.intel.com/test_sets/c98e8654-9d15-11e6-9f24-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/f1a59ea0-9605-11e6-a96f-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/ba662914-9588-11e6-9722-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/793fd440-95af-11e6-a96f-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/14b006f8-956e-11e6-9fed-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/7d75d880-8b9a-11e6-a8b7-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/abe41d90-895f-11e6-a8b7-5254006e85c2

      Attachments

        Activity

          [LU-8738] sanity test_255b: FAIL: Ladvise willread should use more memory than 76800 KiB
          pjones Peter Jones added a comment -

          Landed for 2.9

          pjones Peter Jones added a comment - Landed for 2.9

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23867/
          Subject: LU-8738 tests: ladvise dontneed test write to single OST
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: c632d6094c5a228c86c898400192e81b03833fc5

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23867/ Subject: LU-8738 tests: ladvise dontneed test write to single OST Project: fs/lustre-release Branch: master Current Patch Set: Commit: c632d6094c5a228c86c898400192e81b03833fc5

          This looks a must fail test, we'd try to land the fix asap.

          niu Niu Yawei (Inactive) added a comment - This looks a must fail test, we'd try to land the fix asap.

          James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/23867
          Subject: LU-8738 tests: ladvise dontneed test write to single OST
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 79e15a673e67d80fbf2d04db192a8e14b0802096

          gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/23867 Subject: LU-8738 tests: ladvise dontneed test write to single OST Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 79e15a673e67d80fbf2d04db192a8e14b0802096

          This test still fails with sync after the write.

          jamesanunez James Nunez (Inactive) added a comment - This test still fails with sync after the write.

          John - I have two OSSs with two OSTs each.

          Since there is already a sync on ost1 after the write, I'll try the sync on the client.

          jamesanunez James Nunez (Inactive) added a comment - John - I have two OSSs with two OSTs each. Since there is already a sync on ost1 after the write, I'll try the sync on the client.
          bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/5d201c00-ad94-11e6-8144-5254006e85c2
          jhammond John Hammond added a comment -

          James, are the 4 OSTs on the same OSS?

          If you have a setup that reproduces this issue handy could you try add sync on the line after dd?

          jhammond John Hammond added a comment - James, are the 4 OSTs on the same OSS? If you have a setup that reproduces this issue handy could you try add sync on the line after dd?

          Li Xi - For sanity test_255b, we are only measuring the total amount of cache and amount of cache used for ost1. Yet, the first thing we do is stripe the file across all OSTs; ‘lfs setstripe –c -1 -i 0 …’. To see the impact of caching on ost1 only, do we want to limit the file to a single ost, in particular ost1, meaning ‘lfs setstripe –c 1 -i 0 ...' ? A file on multiple OSTs (full striping) versus a file on a single OST could change amount of cache used by the willread hint on ost1 when you have more than one OST.

          Could this explain why this test passes some of the time and fails some of the time?

          In my testing, using a single striped file for this test succeeds every time and striping a file over 4 OSTs fails every time.

          jamesanunez James Nunez (Inactive) added a comment - Li Xi - For sanity test_255b, we are only measuring the total amount of cache and amount of cache used for ost1. Yet, the first thing we do is stripe the file across all OSTs; ‘lfs setstripe –c -1 -i 0 …’. To see the impact of caching on ost1 only, do we want to limit the file to a single ost, in particular ost1, meaning ‘lfs setstripe –c 1 -i 0 ...' ? A file on multiple OSTs (full striping) versus a file on a single OST could change amount of cache used by the willread hint on ost1 when you have more than one OST. Could this explain why this test passes some of the time and fails some of the time? In my testing, using a single striped file for this test succeeds every time and striping a file over 4 OSTs fails every time.

          This is causing a lot of test failures.

          Li Xi, could you please take a look.

          adilger Andreas Dilger added a comment - This is causing a lot of test failures. Li Xi, could you please take a look.

          People

            jamesanunez James Nunez (Inactive)
            yujian Jian Yu
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: