[LU-12807] runtests test 1 fails with ''Space not all freed: now 180500kB, was 180424kB.'' Created: 25/Sep/19  Updated: 01/Nov/22  Resolved: 17/Oct/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.14.0
Fix Version/s: Lustre 2.15.0

Type: Bug Priority: Major
Reporter: James Nunez (Inactive) Assignee: Andreas Dilger
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Related
is related to LU-15740 runtests test_1: 'Space not all freed Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

runtests test_1 fails for ldiskfs testing with ''Space not all freed: now 180500kB, was 180424kB.'' The last lines in the client test_log are

Waiting for local destroys to complete
 runtests test_1: @@@@@@ FAIL: Space not all freed: now 180500kB, was 180424kB. 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:6119:error()
  = /usr/lib64/lustre/tests/runtests:132:test_1()

In the past , we’ve had issues with this test failing in this way for ZFS; see LU-12579. The last ZFS failure we saw was on 12-SEPT-2019.
This issue started on 17-SEPT-2019 for ldiskfs. We see this test fail mostly in full and full-patchless testing and rarely does it fail in review/patch testing.

Patch https://review.whamcloud.com/#/c/36011/ for LU-12579 landed about the time these tests started failing. The patch modifies the runtests test 1 the difference it will allow on the amount of space used before calling the test a failure

 131 if [ $(expr $NOWUSED - $USED) -gt $(fs_log_size) ]; then
 132         error "Space not all freed: now ${NOWUSED}kB, was ${USED}kB."
 133 else
 134         log "Space was freed: now ${NOWUSED}kB, was ${USED}kB."
 135 fi
 136 }

Prior to the patch, the allowed value of used space was 1024 Kbytes. For ldiskfs, using the fs_log_size() function, it is 50 Kbytes. This value may be too restrictive for ldiskfs in full/full-patchless testing.

Logs for more of these failures are at
https://testing.whamcloud.com/test_sets/cba207a4-dc46-11e9-a197-52540065bddc
https://testing.whamcloud.com/test_sets/67b1c4ce-dc59-11e9-b62b-52540065bddc
https://testing.whamcloud.com/test_sets/b530f9e0-dcc7-11e9-b62b-52540065bddc
https://testing.whamcloud.com/test_sets/8afc04d6-d9e2-11e9-a2b6-52540065bddc



 Comments   
Comment by Jian Yu [ 21/Jan/20 ]

+1 on master branch: https://testing.whamcloud.com/test_sets/d13f6316-3b62-11ea-80b4-52540065bddc

Comment by Jian Yu [ 04/Feb/20 ]

The same failure occurred on SLES15 SP1 client with RHEL 7.7 server on master branch:
https://testing.whamcloud.com/test_sets/57cac7f6-471e-11ea-aeb7-52540065bddc

Comment by Jian Yu [ 13/Apr/20 ]

The failure occurred consistently on master branch:
https://testing.whamcloud.com/test_sets/8e929f8f-8c2c-459c-8386-84185a76f67a

Comment by Andreas Dilger [ 06/Feb/21 ]

+6 in the past week on master full testing (no review test failures):
https://testing.whamcloud.com/sub_tests/a2bf7bd9-fa82-4e92-8e79-82ed8ae7d8c9
https://testing.whamcloud.com/sub_tests/4c64ba72-0216-46cd-a062-718b590bf2b5
https://testing.whamcloud.com/sub_tests/83f33da3-4756-4b50-82de-d9ab064acad7
https://testing.whamcloud.com/sub_tests/ee7e58c4-7a7d-4efe-9e7a-12c52f13ef5f
https://testing.whamcloud.com/sub_tests/05b55136-0ce0-4964-b3b9-3673641addd8
https://testing.whamcloud.com/sub_tests/a63c2f58-1896-405b-8580-7edca929b867

Comment by Gerrit Updater [ 11/Aug/21 ]

"Andreas Dilger <adilger@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/44614
Subject: LU-12807 tests: fix intermittent runtests failure
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d9e46f31589db8d07c2e5e00817065c4a7314178

Comment by Andreas Dilger [ 11/Aug/21 ]

+5 failures in the past week on master full testing, mostly with "Space not all freed: now 8848kB, was 8784kB" (lost 64KB).

Since this is only happening on full testing, it may be that some feature is active from a previous test script that consumes a small amount of space (not Changelogs, but maybe there is a new unlink llog created in the middle of the test?). In any case, I don't think this is a serious issue with only 64KB lost during the test.

Comment by Gerrit Updater [ 17/Oct/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44614/
Subject: LU-12807 tests: fix intermittent runtests failure
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 14d07b623731233a62a8acd021c8ccdcb2705371

Generated at Sat Feb 10 02:55:49 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.