Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4178

Test failure on test suite sanity-hsm, subtest test_200

Details

    • 3
    • 11309

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/4c3bcdec-4025-11e3-bfaf-52540035b04c.

      test_200 was only recently enabled by commit 38695729d61958ab10e9e108175298f8a7d40536. before that is was always skipped due to being in ALWAYS_EXCEPT. I'm wondering if it was a mistake to turn this test on at all. maloo reports:

      Failure Rate: 66.00% of last 100 executions [all branches]

      This failure looks not at all related to the change under test, at least in this case.

      The sub-test test_200 failed with the following error:

      request on sanity-hsm is not @@@@@@

      Info required for matching: sanity-hsm 200

      Attachments

        Issue Links

          Activity

            [LU-4178] Test failure on test suite sanity-hsm, subtest test_200

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13173/
            Subject: LU-4178 tests: Wait requests to reach CDT before Cancel
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 6a31cf92555182a23f14d3385c8c14266887070a

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13173/ Subject: LU-4178 tests: Wait requests to reach CDT before Cancel Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6a31cf92555182a23f14d3385c8c14266887070a

            Closing ticket because sanity-hsm tests 200, 201 and 202 are passing on master for the past month. If any more work needs to be done for this ticket, please open a new ticket and we'll track the work there.

            jamesanunez James Nunez (Inactive) added a comment - Closing ticket because sanity-hsm tests 200, 201 and 202 are passing on master for the past month. If any more work needs to be done for this ticket, please open a new ticket and we'll track the work there.

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13826
            Subject: LU-4178 tests: increase sanity-hsm wait_request_state tiemout
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: 9c56a8f64d2ab4b8db8b3f38dff2d019b8cd3e40

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13826 Subject: LU-4178 tests: increase sanity-hsm wait_request_state tiemout Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: 9c56a8f64d2ab4b8db8b3f38dff2d019b8cd3e40

            James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13825
            Subject: LU-4178 tests: add messages to sanity-hsm
            Project: fs/lustre-release
            Branch: b2_5
            Current Patch Set: 1
            Commit: 75d6d35bcc48eefe490e8b4efd673c58b3373507

            gerrit Gerrit Updater added a comment - James Nunez (james.a.nunez@intel.com) uploaded a new patch: http://review.whamcloud.com/13825 Subject: LU-4178 tests: add messages to sanity-hsm Project: fs/lustre-release Branch: b2_5 Current Patch Set: 1 Commit: 75d6d35bcc48eefe490e8b4efd673c58b3373507

            Reopening ticket because there is one more patch for this ticket that has not landed. The patch is at: http://review.whamcloud.com/#/c/13173/

            jamesanunez James Nunez (Inactive) added a comment - Reopening ticket because there is one more patch for this ticket that has not landed. The patch is at: http://review.whamcloud.com/#/c/13173/

            Patches landed to Master.

            jlevi Jodi Levi (Inactive) added a comment - Patches landed to Master.

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13206/
            Subject: LU-4178 tests: increase sanity-hsm wait_request_state tiemout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 4cb51c76ed2afa168f19e999190a315803580258

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13206/ Subject: LU-4178 tests: increase sanity-hsm wait_request_state tiemout Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4cb51c76ed2afa168f19e999190a315803580258
            jhammond John Hammond added a comment -

            The cancel action does not succeed until the CT reports that the archive is complete. In test_200 we use make_large_for_cancel() which gives us a 100MB file. Because of the 1MB/s bandwidth limit the CT will take at least 100s to archive the file. Since wait_request_state() uses a 100s timeout this make for a very racy test. And since most of these tests still use NFS for the archive there can be additional delays.

            I suggest that we double the timeout in wait_request_state(). Please see http://review.whamcloud.com/13206.

            jhammond John Hammond added a comment - The cancel action does not succeed until the CT reports that the archive is complete. In test_200 we use make_large_for_cancel() which gives us a 100MB file. Because of the 1MB/s bandwidth limit the CT will take at least 100s to archive the file. Since wait_request_state() uses a 100s timeout this make for a very racy test. And since most of these tests still use NFS for the archive there can be additional delays. I suggest that we double the timeout in wait_request_state(). Please see http://review.whamcloud.com/13206 .

            John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/13206
            Subject: LU-4178 tests: increase sanity-hsm wait_request_state tiemout
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 278c1fb845c2cdd7905f717435176e94a4ad7057

            gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/13206 Subject: LU-4178 tests: increase sanity-hsm wait_request_state tiemout Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 278c1fb845c2cdd7905f717435176e94a4ad7057

            The analysis and patch look good but I am surprised because the CDT command registration should be synchronous. So after hsm_archive the CDT entry should be recorded.

            jcl jacques-charles lafoucriere added a comment - The analysis and patch look good but I am surprised because the CDT command registration should be synchronous. So after hsm_archive the CDT entry should be recorded.

            People

              bfaccini Bruno Faccini (Inactive)
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: