Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9872

parallel-scale-nfsv3 test_connectathon: connectathon failed: 1

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.11.0, Lustre 2.10.3
    • Lustre 2.11.0
    • None
    • Trevis2, full
      server: RHEL 7.3, ldiskfs, branch master, v2.10.51, b3620
      client: RHEL 7.3, branch master, v2.10.51, b3620
    • 3
    • 9223372036854775807

    Description

      https://testing.hpdd.intel.com/test_sessions/9b7c7e8e-7b5a-4f4d-af09-400c586a8340

      This issue looks like LU-3801, but it failed with an I/O error rather than no space.

      From test_log:

      write/read 30 MB file
      Warning: can't complete test: can't sync bigfile21643: No space left on device
      special tests failed
       parallel-scale-nfsv3 test_connectathon: @@@@@@ FAIL: connectathon failed: 1 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4980:error()
        = /usr/lib64/lustre/tests/functions.sh:548:run_connectathon()
        = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:108:test_connectathon()
        = /usr/lib64/lustre/tests/test-framework.sh:5256:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:5295:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:5142:run_test()
        = /usr/lib64/lustre/tests/parallel-scale-nfs.sh:110:main()
      

      Attachments

        Issue Links

          Activity

            [LU-9872] parallel-scale-nfsv3 test_connectathon: connectathon failed: 1

            John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30353/
            Subject: LU-9872 tests: modify check space requirements for NFS test
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: 03b330d95434337f3270960f59c62e07a56c5a43

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30353/ Subject: LU-9872 tests: modify check space requirements for NFS test Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 03b330d95434337f3270960f59c62e07a56c5a43

            Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30353
            Subject: LU-9872 tests: modify check space requirements for NFS test
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: db79de77aec28a0e84f65dcc36b7b60895047887

            gerrit Gerrit Updater added a comment - Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30353 Subject: LU-9872 tests: modify check space requirements for NFS test Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: db79de77aec28a0e84f65dcc36b7b60895047887
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29786/
            Subject: LU-9872 tests: modify check space requirements for NFS test
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 0b8e9558e88930814857c97dfe2394f8c8e24a9a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29786/ Subject: LU-9872 tests: modify check space requirements for NFS test Project: fs/lustre-release Branch: master Current Patch Set: Commit: 0b8e9558e88930814857c97dfe2394f8c8e24a9a

            I was trying to find the maximum file system size that connectathon would fail with the ‘No space left on device’ error when running the “special” test type. Looking at the connectathon code, it looks like the “bigfile” test should take the most space on the file system by creating a 30M file. So, I started running parallel-scale-nfsv3 on smaller and smaller file systems until I got the ‘no space’ error.

            At ~96 MB (100925440 bytes) file system, I was able to get on run of parallel-scale-nfsv3 to fail with the no space error, but seven other runs with the same file system ran to completion. So I printed out the amount of free space on the file system before and after each of the connectathon tests, basic (-b), general (-g), special (-s), and lock (–l).

            For a test run that succeeded, here’s the memory before and after 10 iterations of each test type:
            Before –b tests Free space: 100925440 bytes
            After –b tests Free space: 96731136 bytes
            Before –g tests Free space: 96731136 bytes
            After –g tests Free space: 97255424 bytes
            Before –s tests Free space: 97255424 bytes
            After –s tests Free space: 99090432 bytes
            Before –l tests Free space: 99090432 bytes
            After –l tests Free space: 100401152 bytes

            Another run that succeeded:
            Before –b tests Free space: 100925440 bytes
            After –b tests Free space: 94371840 bytes
            Before –g tests Free space: 94371840 bytes
            After –g tests Free space: 95158272 bytes
            Before –s tests Free space: 95158272 bytes
            After –s tests Free space: 49807360 bytes
            Before –l tests Free space: 49807360 bytes
            After –l tests Free space: 100401152 bytes

            For a test run that fails in the special test type with the no space error, we see
            Before –b tests Free space: 100925440 bytes
            After –b tests Free space: 94371840 bytes
            Before –g tests Free space: 94371840 bytes
            After –g tests Free space: 97255424 bytes
            Before –s tests Free space: 97255424 bytes
            [Experienced 'no space' failure in –s tests]
            After –s tests Free space: 81264640 bytes

            jamesanunez James Nunez (Inactive) added a comment - I was trying to find the maximum file system size that connectathon would fail with the ‘No space left on device’ error when running the “special” test type. Looking at the connectathon code, it looks like the “bigfile” test should take the most space on the file system by creating a 30M file. So, I started running parallel-scale-nfsv3 on smaller and smaller file systems until I got the ‘no space’ error. At ~96 MB (100925440 bytes) file system, I was able to get on run of parallel-scale-nfsv3 to fail with the no space error, but seven other runs with the same file system ran to completion. So I printed out the amount of free space on the file system before and after each of the connectathon tests, basic (-b), general (-g), special (-s), and lock (–l). For a test run that succeeded, here’s the memory before and after 10 iterations of each test type: Before –b tests Free space: 100925440 bytes After –b tests Free space: 96731136 bytes Before –g tests Free space: 96731136 bytes After –g tests Free space: 97255424 bytes Before –s tests Free space: 97255424 bytes After –s tests Free space: 99090432 bytes Before –l tests Free space: 99090432 bytes After –l tests Free space: 100401152 bytes Another run that succeeded: Before –b tests Free space: 100925440 bytes After –b tests Free space: 94371840 bytes Before –g tests Free space: 94371840 bytes After –g tests Free space: 95158272 bytes Before –s tests Free space: 95158272 bytes After –s tests Free space: 49807360 bytes Before –l tests Free space: 49807360 bytes After –l tests Free space: 100401152 bytes For a test run that fails in the special test type with the no space error, we see Before –b tests Free space: 100925440 bytes After –b tests Free space: 94371840 bytes Before –g tests Free space: 94371840 bytes After –g tests Free space: 97255424 bytes Before –s tests Free space: 97255424 bytes [Experienced 'no space' failure in –s tests] After –s tests Free space: 81264640 bytes

            parallel-scale-nfsv3 test connectathon started failing with the sync file error

            write/read 30 MB file
            Warning: can't complete test: can't sync bigfile7456: No space left on device
            special tests failed
            parallel-scale-nfsv3 test_connectathon: @@@@@@ FAIL: connectathon failed: 1
            

            starting on August 1, 2017 with master tag 2.10.51 build #3620. Logs for the first failure are at
            https://testing.hpdd.intel.com/test_sets/02ecbd6e-79a0-11e7-8e1f-5254006e85c2
            Logs for more recent failures are at
            https://testing.hpdd.intel.com/sub_tests/990dadc2-bb71-11e7-9abd-52540065bddc
            https://testing.hpdd.intel.com/sub_tests/744189ca-ba98-11e7-8afb-52540065bddc

            parallel-scale-nfsv4 started failing with this error on October 11, 2017 with logs at https://testing.hpdd.intel.com/sub_tests/e41720d8-af86-11e7-a26c-5254006e85c2

            I’ve been running this test to figure out how much space test_connectathon needs to run and add code to the script to skip the test when the file system doesn’t have enough space. From what I can tell, the “bigfile” test writes a 30 MB file and the “bigfile2” test writes at 2 and 4 GB boundaries, but I haven’t seen a 2 or 4 GB file created. So, it looks like the largest file written is 30 MB. Yet, this test still fails with ‘No space left on device’ when there is 177152 KB available.

            jamesanunez James Nunez (Inactive) added a comment - parallel-scale-nfsv3 test connectathon started failing with the sync file error write/read 30 MB file Warning: can't complete test: can't sync bigfile7456: No space left on device special tests failed parallel-scale-nfsv3 test_connectathon: @@@@@@ FAIL: connectathon failed: 1 starting on August 1, 2017 with master tag 2.10.51 build #3620. Logs for the first failure are at https://testing.hpdd.intel.com/test_sets/02ecbd6e-79a0-11e7-8e1f-5254006e85c2 Logs for more recent failures are at https://testing.hpdd.intel.com/sub_tests/990dadc2-bb71-11e7-9abd-52540065bddc https://testing.hpdd.intel.com/sub_tests/744189ca-ba98-11e7-8afb-52540065bddc parallel-scale-nfsv4 started failing with this error on October 11, 2017 with logs at https://testing.hpdd.intel.com/sub_tests/e41720d8-af86-11e7-a26c-5254006e85c2 I’ve been running this test to figure out how much space test_connectathon needs to run and add code to the script to skip the test when the file system doesn’t have enough space. From what I can tell, the “bigfile” test writes a 30 MB file and the “bigfile2” test writes at 2 and 4 GB boundaries, but I haven’t seen a 2 or 4 GB file created. So, it looks like the largest file written is 30 MB. Yet, this test still fails with ‘No space left on device’ when there is 177152 KB available.

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/29786
            Subject: LU-9872 tests: check space requirements for NFS test
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 67172edc5eb09bff9c3cc28e37188375d76af77f

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/29786 Subject: LU-9872 tests: check space requirements for NFS test Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 67172edc5eb09bff9c3cc28e37188375d76af77f
            bogl Bob Glossman (Inactive) added a comment - - edited

            The most recent change I can find that seems like it might have an impact is

            LU-6900 tests: parallel-scale-nfs improvement

            That landed 6/7/17

            bogl Bob Glossman (Inactive) added a comment - - edited The most recent change I can find that seems like it might have an impact is LU-6900 tests: parallel-scale-nfs improvement That landed 6/7/17
            bogl Bob Glossman (Inactive) added a comment - - edited

            James said

            This issue looks like LU-3801, but it failed with an I/O error rather than no space.

            But from test log it looks like it is in fact failing with no space:

            Warning: can't complete test: can't sync bigfile21643: No space left on device
            special tests failed
            

            Can't say why it is running out of space.

            bogl Bob Glossman (Inactive) added a comment - - edited James said This issue looks like LU-3801 , but it failed with an I/O error rather than no space. But from test log it looks like it is in fact failing with no space: Warning: can't complete test: can't sync bigfile21643: No space left on device special tests failed Can't say why it is running out of space.
            pjones Peter Jones added a comment -

            Bob

            Could you please look into this one? New in the latest master tag and fails 80% of the time.

            Peter

            pjones Peter Jones added a comment - Bob Could you please look into this one? New in the latest master tag and fails 80% of the time. Peter

            People

              bogl Bob Glossman (Inactive)
              jcasper James Casper (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: