[LU-10347] sanity-hsm test_252: archive request fails rather than canceling out Created: 07/Dec/17  Updated: 28/Mar/18  Resolved: 28/Mar/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for John Hammond <john.hammond@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/42d77f44-db21-11e7-a066-52540065bddc.

The sub-test test_252 failed with the following error:

request on 0x200000405:0x133:0x0 is not CANCELED on mds1

Info required for matching: sanity-hsm 252



 Comments   
Comment by John Hammond [ 07/Dec/17 ]

The CT calls ct_begin() before opening the file to be archived so there is a small race in this test:

	$LFS hsm_archive --archive $HSM_ARCHIVE_NUMBER $f
        wait_request_state $fid ARCHIVE STARTED
	rm -f $f

which cause the archive request to be failed rather than canceled.

Comment by Gerrit Updater [ 07/Dec/17 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: https://review.whamcloud.com/30434
Subject: LU-10347 test: give CT time to open file
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2f3dda96a4d1e6480b5f909a0820ae5a5f720a13

Comment by Quentin Bouget [ 07/Dec/17 ]

No, adding a delay is not reliable enough. The copytool just needs to be suspended until the request times out (although in that case you will hit LU-10302).

Comment by Bob Glossman (Inactive) [ 08/Dec/17 ]

another on master:
https://testing.hpdd.intel.com/test_sets/ac23e53a-dbd2-11e7-9840-52540065bddc

Comment by Gerrit Updater [ 12/Dec/17 ]

Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/30492
Subject: LU-10347 tests: suspend the copytool in sanity-hsm/test_252
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6ec2a51bffe5a9d5b7e0b5a8068a95fa2780a369

Comment by Gerrit Updater [ 09/Feb/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/30492/
Subject: LU-10347 tests: suspend the copytool in sanity-hsm/test_252
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 96fbf0935977a9669d2a3bb2612db8b7eba3e5a5

Comment by Joseph Gmitter (Inactive) [ 28/Mar/18 ]

Fix to suspend has landed for 2.11.0

Generated at Sat Feb 10 02:34:14 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.