[LU-9872] parallel-scale-nfsv3 test_connectathon: connectathon failed: 1 Created: 11/Aug/17  Updated: 03/Nov/18  Resolved: 29/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: Lustre 2.11.0, Lustre 2.10.3

Type: Bug Priority: Major
Reporter: James Casper Assignee: Bob Glossman (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Trevis2, full
server: RHEL 7.3, ldiskfs, branch master, v2.10.51, b3620
client: RHEL 7.3, branch master, v2.10.51, b3620


Issue Links:
Related
is related to LU-10689 parallel-scale-nfsv3 test_connectatho... Open
is related to LU-4905 2.1.6<->2.4.3 interop: parallel-scale... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

https://testing.hpdd.intel.com/test_sessions/9b7c7e8e-7b5a-4f4d-af09-400c586a8340

This issue looks like LU-3801, but it failed with an I/O error rather than no space.

From test_log:

write/read 30 MB file
Warning: can't complete test: can't sync bigfile21643: No space left on device
special tests failed
 parallel-scale-nfsv3 test_connectathon: @@@@@@ FAIL: connectathon failed: 1 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4980:error()
  = /usr/lib64/lustre/tests/functions.sh:548:run_connectathon()
  = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:108:test_connectathon()
  = /usr/lib64/lustre/tests/test-framework.sh:5256:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5295:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5142:run_test()
  = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:110:main()


 Comments   
Comment by Peter Jones [ 30/Aug/17 ]

Bob

Could you please look into this one? New in the latest master tag and fails 80% of the time.

Peter

Comment by Bob Glossman (Inactive) [ 30/Aug/17 ]

James said

This issue looks like LU-3801, but it failed with an I/O error rather than no space.

But from test log it looks like it is in fact failing with no space:

Warning: can't complete test: can't sync bigfile21643: No space left on device
special tests failed

Can't say why it is running out of space.

Comment by Bob Glossman (Inactive) [ 31/Aug/17 ]

The most recent change I can find that seems like it might have an impact is

LU-6900 tests: parallel-scale-nfs improvement

That landed 6/7/17

Comment by Gerrit Updater [ 26/Oct/17 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/29786
Subject: LU-9872 tests: check space requirements for NFS test
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 67172edc5eb09bff9c3cc28e37188375d76af77f

Comment by James Nunez (Inactive) [ 28/Oct/17 ]

parallel-scale-nfsv3 test connectathon started failing with the sync file error

write/read 30 MB file
Warning: can't complete test: can't sync bigfile7456: No space left on device
special tests failed
parallel-scale-nfsv3 test_connectathon: @@@@@@ FAIL: connectathon failed: 1

starting on August 1, 2017 with master tag 2.10.51 build #3620. Logs for the first failure are at
https://testing.hpdd.intel.com/test_sets/02ecbd6e-79a0-11e7-8e1f-5254006e85c2
Logs for more recent failures are at
https://testing.hpdd.intel.com/sub_tests/990dadc2-bb71-11e7-9abd-52540065bddc
https://testing.hpdd.intel.com/sub_tests/744189ca-ba98-11e7-8afb-52540065bddc

parallel-scale-nfsv4 started failing with this error on October 11, 2017 with logs at https://testing.hpdd.intel.com/sub_tests/e41720d8-af86-11e7-a26c-5254006e85c2

I’ve been running this test to figure out how much space test_connectathon needs to run and add code to the script to skip the test when the file system doesn’t have enough space. From what I can tell, the “bigfile” test writes a 30 MB file and the “bigfile2” test writes at 2 and 4 GB boundaries, but I haven’t seen a 2 or 4 GB file created. So, it looks like the largest file written is 30 MB. Yet, this test still fails with ‘No space left on device’ when there is 177152 KB available.

Comment by James Nunez (Inactive) [ 08/Nov/17 ]

I was trying to find the maximum file system size that connectathon would fail with the ‘No space left on device’ error when running the “special” test type. Looking at the connectathon code, it looks like the “bigfile” test should take the most space on the file system by creating a 30M file. So, I started running parallel-scale-nfsv3 on smaller and smaller file systems until I got the ‘no space’ error.

At ~96 MB (100925440 bytes) file system, I was able to get on run of parallel-scale-nfsv3 to fail with the no space error, but seven other runs with the same file system ran to completion. So I printed out the amount of free space on the file system before and after each of the connectathon tests, basic (-b), general (-g), special (-s), and lock (–l).

For a test run that succeeded, here’s the memory before and after 10 iterations of each test type:
Before –b tests Free space: 100925440 bytes
After –b tests Free space: 96731136 bytes
Before –g tests Free space: 96731136 bytes
After –g tests Free space: 97255424 bytes
Before –s tests Free space: 97255424 bytes
After –s tests Free space: 99090432 bytes
Before –l tests Free space: 99090432 bytes
After –l tests Free space: 100401152 bytes

Another run that succeeded:
Before –b tests Free space: 100925440 bytes
After –b tests Free space: 94371840 bytes
Before –g tests Free space: 94371840 bytes
After –g tests Free space: 95158272 bytes
Before –s tests Free space: 95158272 bytes
After –s tests Free space: 49807360 bytes
Before –l tests Free space: 49807360 bytes
After –l tests Free space: 100401152 bytes

For a test run that fails in the special test type with the no space error, we see
Before –b tests Free space: 100925440 bytes
After –b tests Free space: 94371840 bytes
Before –g tests Free space: 94371840 bytes
After –g tests Free space: 97255424 bytes
Before –s tests Free space: 97255424 bytes
[Experienced 'no space' failure in –s tests]
After –s tests Free space: 81264640 bytes

Comment by Gerrit Updater [ 29/Nov/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29786/
Subject: LU-9872 tests: modify check space requirements for NFS test
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 0b8e9558e88930814857c97dfe2394f8c8e24a9a

Comment by Peter Jones [ 29/Nov/17 ]

Landed for 2.11

Comment by Gerrit Updater [ 04/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30353
Subject: LU-9872 tests: modify check space requirements for NFS test
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: db79de77aec28a0e84f65dcc36b7b60895047887

Comment by Gerrit Updater [ 19/Dec/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30353/
Subject: LU-9872 tests: modify check space requirements for NFS test
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 03b330d95434337f3270960f59c62e07a56c5a43

Generated at Sat Feb 10 02:30:04 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.