[LU-6951] sanity test_27m: @@@@@@ FAIL: OST0 was full but new created file still use it Created: 04/Aug/15  Updated: 29/May/19  Resolved: 04/May/19

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.8.0
Fix Version/s: Lustre 2.13.0

Type: Bug Priority: Minor
Reporter: Andrew Perepechko Assignee: Andrew Perepechko
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-3859 grant shrinker floods OST and produce... Resolved
Epic/Theme: test
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

sanity 27m fails with "OST0 was full but new created file still use it"
if this test runs with more than 1 connected client. A simple reproducer
is "MOUNT_2=true REFORMAT=true ONLY=27m bash sanity.sh".

The reason seems to be grants. In my setup, every client initially
gets 2 Mb grant which never shrinks to 0. When dd from the first
client receives ENOSPC, it does not really mean the OST is filled up,
since the client is not allowed to use other clients' grants. When
creating a new file, the MDS still sees free space on OST0 equal to
the amount of unused grants and allocates new objects on OST0.
Eventually, the test reports failure.

It's not clear if it's a defect in Lustre or it's a test defect and
clients # > 1 case should be skipped. We would like to know your
opinion. Thanks.



 Comments   
Comment by Alex Zhuravlev [ 04/Aug/15 ]

the test wasn't designed to run with few clients. that said, this problem with the grants is known and there was a try to solve it returning granted space with ping (implying no client's activity on this OST), but the functionality got disable due to issues.

Comment by Andrew Perepechko [ 04/Aug/15 ]

Alex, do you think we should skip this test if running with clients # > 1 or temporarilly unmount other clients for the test run?

Comment by Andreas Dilger [ 04/Aug/15 ]

The best solution would be to resurrect and fix the grant shrinking functionality, so that the idle client would return the grant back to the OST when space is running low.

I see in LU-3859 that grant shrink was enabled in your code at one point, but I guess it still had some bugs at that time. The grant shrinking was disabled due to problems before 1.8.1 and 2.0.0 were released (https://bugzilla.lustre.org/show_bug.cgi?id=19507) and hasn't been fixed since then.

I don't have any strong preference about whether you unmount the other client or just skip the test, but it is probably less complex to just skip the test in case of multiple mounts. The sanity.sh tests all assume that there is only a single client mounted, since multi-mount tests are run in sanityn.sh. I suspect there may be other test failures hit if you are running sanity.sh with multiple mounts.

Comment by Gerrit Updater [ 01/Nov/16 ]

Andrew Perepechko (andrew.perepechko@seagate.com) uploaded a new patch: http://review.whamcloud.com/23506
Subject: LU-6951 tests: sanity test_27m failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f192c6a38e4d429c9b52f10cf38c4a0519de270e

Comment by Gerrit Updater [ 04/May/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/23506/
Subject: LU-6951 tests: sanity test_27m failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fafb93b96179d5fd2c9bc83b19e512180a4f833e

Comment by Peter Jones [ 04/May/19 ]

Landed for 2.13

Comment by Colin Faber [X] (Inactive) [ 29/May/19 ]

should this be closed?

Comment by Andrew Perepechko [ 29/May/19 ]

No new patches are expected.

Generated at Sat Feb 10 02:04:42 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.