[LU-7881]  sanity-hsm test_26b: @@@@@@ FAIL: Copytool should have stopped Created: 15/Mar/16  Updated: 15/Mar/16  Resolved: 15/Mar/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: Bruno Faccini (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Related
is related to LU-4640 Last unlink should trigger HSM remove... Resolved
is related to LU-7136 sanity-hsm test_12q failed with 'Copy... Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for nasf <fan.yong@intel.com>

Please provide additional information about the failure here.

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/655b0eb6-ea4e-11e5-8606-5254006e85c2.

The log shows

/usr/lib64/lustre/tests/sanity-hsm.sh: line 2379: search_and_kill_copytool: command not found
CMD: onyx-49vm6 pgrep -x lhsmtool_posix
onyx-49vm6: 14891
sanity-hsm test_26b: @@@@@@ FAIL: Copytool should have stopped



 Comments   
Comment by Bruno Faccini (Inactive) [ 15/Mar/16 ]

Well, looks like the search_and_kill_copytool() function, internal to sanity-hsm.h, has disappeared ( presumably with Gerrit-change #17499 for LU-7136) in the meantime patch for LU-4640 (also introducing a new sanity-hsm.sh/test_26b sub-tests that used search_and_kill_copytool() !!) has landed ...

Comment by Gerrit Updater [ 15/Mar/16 ]

Faccini Bruno (bruno.faccini@intel.com) uploaded a new patch: http://review.whamcloud.com/18919
Subject: LU-7881 tests: use new functions to kill and verify CT death
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 36ee93fd00a81ae2d492fa89b569dc023fcea64d

Comment by Richard Henwood (Inactive) [ 15/Mar/16 ]

For the record, I've just seen a failure of this type. It is on code WITHOUT Bruno's patch, linked above.

https://testing.hpdd.intel.com/test_sets/aff358a0-ea23-11e5-8186-5254006e85c2

Comment by Peter Jones [ 15/Mar/16 ]

Bruno

Given that this is causing a lot of test failures Oleg is going to revert the original fix. Could you please combine this test fix into the LU-4640 patch and ensure that test parameters are used to run the affected test 10 times so that we can see for sure that the test fix is robust

Thanks

Peter

Comment by Bruno Faccini (Inactive) [ 15/Mar/16 ]

Yes will do but this extra work could have been avoided if patch for LU-4640 had been landed quicker, and not leave patch for LU-7136 to land in between and change sanity-hsm framework. We may be able to detect/control this kind of timing-window race by re-running tests after patches have been merged ?

Generated at Sat Feb 10 02:12:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.