[LU-9872] parallel-scale-nfsv3 test_connectathon: connectathon failed: 1 Created: 11/Aug/17 Updated: 03/Nov/18 Resolved: 29/Nov/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | James Casper | Assignee: | Bob Glossman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Trevis2, full |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
https://testing.hpdd.intel.com/test_sessions/9b7c7e8e-7b5a-4f4d-af09-400c586a8340 This issue looks like From test_log: write/read 30 MB file Warning: can't complete test: can't sync bigfile21643: No space left on device special tests failed parallel-scale-nfsv3 test_connectathon: @@@@@@ FAIL: connectathon failed: 1 Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4980:error() = /usr/lib64/lustre/tests/functions.sh:548:run_connectathon() = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:108:test_connectathon() = /usr/lib64/lustre/tests/test-framework.sh:5256:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5295:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:5142:run_test() = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:110:main() |
| Comments |
| Comment by Peter Jones [ 30/Aug/17 ] |
|
Bob Could you please look into this one? New in the latest master tag and fails 80% of the time. Peter |
| Comment by Bob Glossman (Inactive) [ 30/Aug/17 ] |
|
James said
But from test log it looks like it is in fact failing with no space: Warning: can't complete test: can't sync bigfile21643: No space left on device special tests failed Can't say why it is running out of space. |
| Comment by Bob Glossman (Inactive) [ 31/Aug/17 ] |
|
The most recent change I can find that seems like it might have an impact is
That landed 6/7/17 |
| Comment by Gerrit Updater [ 26/Oct/17 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/29786 |
| Comment by James Nunez (Inactive) [ 28/Oct/17 ] |
|
parallel-scale-nfsv3 test connectathon started failing with the sync file error write/read 30 MB file Warning: can't complete test: can't sync bigfile7456: No space left on device special tests failed parallel-scale-nfsv3 test_connectathon: @@@@@@ FAIL: connectathon failed: 1 starting on August 1, 2017 with master tag 2.10.51 build #3620. Logs for the first failure are at parallel-scale-nfsv4 started failing with this error on October 11, 2017 with logs at https://testing.hpdd.intel.com/sub_tests/e41720d8-af86-11e7-a26c-5254006e85c2 I’ve been running this test to figure out how much space test_connectathon needs to run and add code to the script to skip the test when the file system doesn’t have enough space. From what I can tell, the “bigfile” test writes a 30 MB file and the “bigfile2” test writes at 2 and 4 GB boundaries, but I haven’t seen a 2 or 4 GB file created. So, it looks like the largest file written is 30 MB. Yet, this test still fails with ‘No space left on device’ when there is 177152 KB available. |
| Comment by James Nunez (Inactive) [ 08/Nov/17 ] |
|
I was trying to find the maximum file system size that connectathon would fail with the ‘No space left on device’ error when running the “special” test type. Looking at the connectathon code, it looks like the “bigfile” test should take the most space on the file system by creating a 30M file. So, I started running parallel-scale-nfsv3 on smaller and smaller file systems until I got the ‘no space’ error. At ~96 MB (100925440 bytes) file system, I was able to get on run of parallel-scale-nfsv3 to fail with the no space error, but seven other runs with the same file system ran to completion. So I printed out the amount of free space on the file system before and after each of the connectathon tests, basic (-b), general (-g), special (-s), and lock (–l). For a test run that succeeded, here’s the memory before and after 10 iterations of each test type: Another run that succeeded: For a test run that fails in the special test type with the no space error, we see |
| Comment by Gerrit Updater [ 29/Nov/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29786/ |
| Comment by Peter Jones [ 29/Nov/17 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 04/Dec/17 ] |
|
Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30353 |
| Comment by Gerrit Updater [ 19/Dec/17 ] |
|
John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30353/ |