[LU-8738] sanity test_255b: FAIL: Ladvise willread should use more memory than 76800 KiB Created: 20/Oct/16  Updated: 21/Nov/16  Resolved: 21/Nov/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.9.0
Fix Version/s: Lustre 2.9.0

Type: Bug Priority: Critical
Reporter: Jian Yu Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity test 255b failed as follows:

== sanity test 255b: check 'lfs ladvise -a dontneed' ================================================= 17:21:04 (1476811264)
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.778124 s, 135 MB/s
CMD: trevis-41vm4 cat /proc/meminfo | grep ^MemTotal:
Total memory: 1923480 KiB
CMD: trevis-41vm4 sync && echo 3 > /proc/sys/vm/drop_caches
CMD: trevis-41vm4 cat /proc/meminfo | grep ^Cached:
Cache used before read: 72592 KiB
CMD: trevis-41vm4 cat /proc/meminfo | grep ^Cached:
Cache used after read: 120972 KiB
CMD: trevis-41vm4 cat /proc/meminfo | grep ^Cached:
Cache used after dontneed ladvise: 18572 KiB
 sanity test_255b: @@@@@@ FAIL: Ladvise willread should use more memory than 76800 KiB

Maloo reports:
https://testing.hpdd.intel.com/test_sets/c98e8654-9d15-11e6-9f24-5254006e85c2
https://testing.hpdd.intel.com/test_sets/f1a59ea0-9605-11e6-a96f-5254006e85c2
https://testing.hpdd.intel.com/test_sets/ba662914-9588-11e6-9722-5254006e85c2
https://testing.hpdd.intel.com/test_sets/793fd440-95af-11e6-a96f-5254006e85c2
https://testing.hpdd.intel.com/test_sets/14b006f8-956e-11e6-9fed-5254006e85c2
https://testing.hpdd.intel.com/test_sets/7d75d880-8b9a-11e6-a8b7-5254006e85c2
https://testing.hpdd.intel.com/test_sets/abe41d90-895f-11e6-a8b7-5254006e85c2



 Comments   
Comment by nasf (Inactive) [ 07/Nov/16 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/27db5292-a468-11e6-8a63-5254006e85c2

Comment by Bob Glossman (Inactive) [ 13/Nov/16 ]

another on master:
https://testing.hpdd.intel.com/test_sets/e7949930-a88d-11e6-882e-5254006e85c2

Comment by John Hammond [ 15/Nov/16 ]

On master:

https://testing.hpdd.intel.com/test_sets/37218620-ab04-11e6-a726-5254006e85c2

Note that ladvise willread used almost 76800 KiB.

Comment by Steve Guminski (Inactive) [ 16/Nov/16 ]

Again on master:

https://testing.hpdd.intel.com/test_sets/a6324fec-ab84-11e6-a76e-5254006e85c2

Comment by Jian Yu [ 16/Nov/16 ]

One more failure instance on master branch:
https://testing.hpdd.intel.com/test_sets/7e484224-aa8b-11e6-a095-5254006e85c2

Comment by Steve Guminski (Inactive) [ 16/Nov/16 ]

Another failure on master:

https://testing.hpdd.intel.com/test_sets/8e908c38-ac1e-11e6-9116-5254006e85c2

Comment by Andreas Dilger [ 17/Nov/16 ]

This is causing a lot of test failures.

Li Xi, could you please take a look.

Comment by James Nunez (Inactive) [ 17/Nov/16 ]

Li Xi - For sanity test_255b, we are only measuring the total amount of cache and amount of cache used for ost1. Yet, the first thing we do is stripe the file across all OSTs; ‘lfs setstripe –c -1 -i 0 …’. To see the impact of caching on ost1 only, do we want to limit the file to a single ost, in particular ost1, meaning ‘lfs setstripe –c 1 -i 0 ...' ? A file on multiple OSTs (full striping) versus a file on a single OST could change amount of cache used by the willread hint on ost1 when you have more than one OST.

Could this explain why this test passes some of the time and fails some of the time?

In my testing, using a single striped file for this test succeeds every time and striping a file over 4 OSTs fails every time.

Comment by John Hammond [ 18/Nov/16 ]

James, are the 4 OSTs on the same OSS?

If you have a setup that reproduces this issue handy could you try add sync on the line after dd?

Comment by Bob Glossman (Inactive) [ 18/Nov/16 ]

another on master:
https://testing.hpdd.intel.com/test_sets/5d201c00-ad94-11e6-8144-5254006e85c2

Comment by James Nunez (Inactive) [ 18/Nov/16 ]

John - I have two OSSs with two OSTs each.

Since there is already a sync on ost1 after the write, I'll try the sync on the client.

Comment by James Nunez (Inactive) [ 18/Nov/16 ]

This test still fails with sync after the write.

Comment by Gerrit Updater [ 18/Nov/16 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/23867
Subject: LU-8738 tests: ladvise dontneed test write to single OST
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 79e15a673e67d80fbf2d04db192a8e14b0802096

Comment by Niu Yawei (Inactive) [ 21/Nov/16 ]

This looks a must fail test, we'd try to land the fix asap.

Comment by Gerrit Updater [ 21/Nov/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/23867/
Subject: LU-8738 tests: ladvise dontneed test write to single OST
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c632d6094c5a228c86c898400192e81b03833fc5

Comment by Peter Jones [ 21/Nov/16 ]

Landed for 2.9

Generated at Sat Feb 10 02:20:08 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.