[LU-3976] sanity-hsm test_9a failure: 'hsm flags on f.sanity-hsm.9a.1 are 0x00000009 != 0x00000001 Created: 19/Sep/13  Updated: 24/Sep/13  Resolved: 24/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.5.0

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: James Nunez (Inactive)
Resolution: Fixed Votes: 0
Labels: HSM, patch
Environment:

OpenSFS cluster with combined MGS/MDS, single OSS with two OSTs, four clients; one agent (c07), one with robinhood/db running (c08) and just running as Lustre clients (c09, c10)


Severity: 3
Rank (Obsolete): 10610

 Description   

Test results are at https://maloo.whamcloud.com/test_sets/28e49004-2171-11e3-b1f0-52540035b04c

This may just be an error in the test. From John Hammond:

I think 9a is a bug in the test.
0x00000001 should be 0x00000009.

From the test log, we see

== sanity-hsm test 9a: Multiple remote agents == 10:42:31 (1379526151)
pdsh@c10: c07: ssh exited with exit code 1
pdsh@c10: c07: ssh exited with exit code 1
Purging archive on c07
Starting copytool agt1 on c07
2+0 records in
2+0 records out
2097152 bytes (2.1 MB) copied, 0.322563 s, 6.5 MB/s
Changed after 0s: from '' to 'STARTED'
Waiting 100 secs for update
 sanity-hsm test_9a: @@@@@@ FAIL: hsm flags on /lustre/scratch/d0.sanity-hsm/d9/f.sanity-hsm.9a.1 are 0x00000009 != 0x00000001 

I think the following are harmless, but dmesg on the MDS shows the following:

Lustre: DEBUG MARKER: == sanity-hsm test 9: Use of explict archive number, with dedicated copytool == 10:42:28 (1379526148)
LustreError: 30192:0:(mdt_coordinator.c:917:mdt_hsm_cdt_start()) scratch-MDT0000: Coordinator already started
LustreError: 30192:0:(obd_config.c:1346:class_process_proc_param()) writing proc entry hsm_control err -114
Lustre: DEBUG MARKER: == sanity-hsm test 9a: Multiple remote agents == 10:42:31 (1379526151)
Lustre: DEBUG MARKER: sanity-hsm test_9a: @@@@@@ FAIL: hsm flags on /lustre/scratch/d0.sanity-hsm/d9/f.sanity-hsm.9a.1 are 0x00000009 != 0x00000001


 Comments   
Comment by jacques-charles lafoucriere [ 20/Sep/13 ]

Agree: test 9a is buggy 0x00000001 must be replaced by 0x00000009 (ARCHIVED + EXIST). If you do the patch please do also following change:

  • replace need2clients by needclients which will take a client count as arg
  • test_9a should be changed to use needclients 3
  • need2clients should be changed to use needclients 2

I can do the patch

Comment by James Nunez (Inactive) [ 20/Sep/13 ]

I'm tied up with HSM testing. So, I don't plan to create a patch for this test. Please create and submit the patch if you have time.

Thanks, James

Comment by Jodi Levi (Inactive) [ 20/Sep/13 ]

JC is working on the patch. Assigning to James to shepherd through.

Comment by jacques-charles lafoucriere [ 22/Sep/13 ]

Patch at http://review.whamcloud.com/7723

Comment by Peter Jones [ 24/Sep/13 ]

Landed for 2.5.0

Generated at Sat Feb 10 01:38:33 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.