[LU-4063] sanity-hsm test_12a failure: 'Restored file differs' Created: 04/Oct/13  Updated: 22/May/14  Resolved: 18/Apr/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: Lustre 2.6.0, Lustre 2.5.2

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: HSM
Environment:

Luster master build # 1715
OpenSFS cluster with combined MGS/MDS, single OSS with two OSTs, three clients; one agent + client, one with robinhood/db running + client and one just running as Lustre clients


Severity: 3
Rank (Obsolete): 10887

 Description   

The test results are at: https://maloo.whamcloud.com/test_sets/8e9cca2c-2c8b-11e3-85ee-52540035b04c

From the client test_log:

== sanity-hsm test 12a: Restore an imported file explicitly == 14:02:01 (1380834121)
pdsh@c15: c13: ssh exited with exit code 1
Purging archive on c13
Starting copytool agt1 on c13
c13: lhsmtool_posix[5634]: action=1 src=d0.sanity-hsm/d12/f.sanity-hsm.12a dst=/lustre/scratch/d0.sanity-hsm/d12/f.sanity-hsm.12a mount_point=/lustre/scratch
c13: lhsmtool_posix[5634]: importing '/lustre/scratch/d0.sanity-hsm/d12/f.sanity-hsm.12a' from '/lustre/archive/d0.sanity-hsm/d12/f.sanity-hsm.12a'
c13: lhsmtool_posix[5634]: imported '/lustre/scratch/d0.sanity-hsm/d12/f.sanity-hsm.12a' from '/lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0'=='/lustre/archive/d0.sanity-hsm/d12/f.sanity-hsm.12a'
c13: lhsmtool_posix[5634]: process finished, errs: 0 major, 0 minor, rc=0 (Success)
Verifying released state: 
Verifying file state: 
c13: diff: /lustre/scratch2/d0.sanity-hsm/d12/f.sanity-hsm.12a: No such file or directory
pdsh@c15: c13: ssh exited with exit code 2
 sanity-hsm test_12a: @@@@@@ FAIL: Restored file differs 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit()
  = /usr/lib64/lustre/tests/test-framework.sh:4291:error()
  = /usr/lib64/lustre/tests/sanity-hsm.sh:853:test_12a()

From the copytool log, it looks like the copy tool is having problems finding the file:

lhsmtool_posix[5564]: action=0 src=(null) dst=(null) mount_point=/lustre/scratch
lhsmtool_posix[5565]: waiting for message from kernel
lhsmtool_posix[5565]: copytool fs=scratch archive#=2 item_count=1
lhsmtool_posix[5565]: waiting for message from kernel
lhsmtool_posix[5635]: '[0x200000402:0x2:0x0]' action RESTORE reclen 72, cookie=0x524dd1eb
lhsmtool_posix[5635]: processing file 'd0.sanity-hsm/d12/f.sanity-hsm.12a'
lhsmtool_posix[5635]: reading stripe rules from '/lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0.lov' for '/lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0'
lhsmtool_posix[5635]: cannot open '/lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0.lov': No such file or directory (2)
lhsmtool_posix[5635]: cannot get stripe rules for '/lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0' (No data available), use default
lhsmtool_posix[5635]: restoring data from '/lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0' to '{VOLATILE}=[0x200000402:0x3:0x0]'
lhsmtool_posix[5635]: going to copy data from '/lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0' to '{VOLATILE}=[0x200000402:0x3:0x0]'
lhsmtool_posix[5635]: Going to copy 363 bytes /lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0 -> {VOLATILE}=[0x200000402:0x3:0x0]

lhsmtool_posix[5635]: data restore from '/lustre/archive/0002/0000/0402/0000/0002/0000/0x200000402:0x2:0x0' to '{VOLATILE}=[0x200000402:0x3:0x0]' done
lhsmtool_posix[5635]: Action completed, notifying coordinator cookie=0x524dd1eb, FID=[0x200000402:0x2:0x0], hp_flags=0 err=0
lhsmtool_posix[5635]: llapi_hsm_action_end() on '/lustre/scratch/.lustre/fid/0x200000402:0x2:0x0' ok (rc=0)
exiting: Interrupt

Looking at all the tests in sanity-hsm, very few tests use $DIR2 to access files on the file system:

	local f=$DIR/$tdir/$tfile
	import_file $tdir/$tfile $f
	local f=$DIR2/$tdir/$tfile

Commenting out the last line above allows the test to complete successfully, but this may defeat what is being tested.



 Comments   
Comment by Jinshan Xiong (Inactive) [ 04/Oct/13 ]

it turns out that DIR2 was mounted on agent host so it makes no sense to access the file from it. Let's just use DIR to compare the file.

Comment by Jinshan Xiong (Inactive) [ 07/Oct/13 ]

patch is located at: http://review.whamcloud.com/7869

Comment by James Nunez (Inactive) [ 25/Feb/14 ]

I tested this patch on the latest b2_5 and it allows test_12a to pass. Before applying the patch, test_12a would fail every time I ran it.

Comment by James Nunez (Inactive) [ 01/Apr/14 ]

Patch for b2_5 at: http://review.whamcloud.com/#/c/9860

Comment by James Nunez (Inactive) [ 18/Apr/14 ]

Landed to master

Generated at Sat Feb 10 01:39:19 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.