Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10347

sanity-hsm test_252: archive request fails rather than canceling out

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.11.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for John Hammond <john.hammond@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/42d77f44-db21-11e7-a066-52540065bddc.

      The sub-test test_252 failed with the following error:

      request on 0x200000405:0x133:0x0 is not CANCELED on mds1
      

      Info required for matching: sanity-hsm 252

      Attachments

        Activity

          [LU-10347] sanity-hsm test_252: archive request fails rather than canceling out

          Fix to suspend has landed for 2.11.0

          jgmitter Joseph Gmitter (Inactive) added a comment - Fix to suspend has landed for 2.11.0

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30492/
          Subject: LU-10347 tests: suspend the copytool in sanity-hsm/test_252
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 96fbf0935977a9669d2a3bb2612db8b7eba3e5a5

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30492/ Subject: LU-10347 tests: suspend the copytool in sanity-hsm/test_252 Project: fs/lustre-release Branch: master Current Patch Set: Commit: 96fbf0935977a9669d2a3bb2612db8b7eba3e5a5

          Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/30492
          Subject: LU-10347 tests: suspend the copytool in sanity-hsm/test_252
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 6ec2a51bffe5a9d5b7e0b5a8068a95fa2780a369

          gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/30492 Subject: LU-10347 tests: suspend the copytool in sanity-hsm/test_252 Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 6ec2a51bffe5a9d5b7e0b5a8068a95fa2780a369
          bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/ac23e53a-dbd2-11e7-9840-52540065bddc
          bougetq Quentin Bouget (Inactive) added a comment - - edited

          No, adding a delay is not reliable enough. The copytool just needs to be suspended until the request times out (although in that case you will hit LU-10302).

          bougetq Quentin Bouget (Inactive) added a comment - - edited No, adding a delay is not reliable enough. The copytool just needs to be suspended until the request times out (although in that case you will hit LU-10302 ).

          John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30434
          Subject: LU-10347 test: give CT time to open file
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 2f3dda96a4d1e6480b5f909a0820ae5a5f720a13

          gerrit Gerrit Updater added a comment - John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30434 Subject: LU-10347 test: give CT time to open file Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 2f3dda96a4d1e6480b5f909a0820ae5a5f720a13
          jhammond John Hammond added a comment -

          The CT calls ct_begin() before opening the file to be archived so there is a small race in this test:

          	$LFS hsm_archive --archive $HSM_ARCHIVE_NUMBER $f
                  wait_request_state $fid ARCHIVE STARTED
          	rm -f $f
          

          which cause the archive request to be failed rather than canceled.

          jhammond John Hammond added a comment - The CT calls ct_begin() before opening the file to be archived so there is a small race in this test: $LFS hsm_archive --archive $HSM_ARCHIVE_NUMBER $f wait_request_state $fid ARCHIVE STARTED rm -f $f which cause the archive request to be failed rather than canceled.

          People

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: