[LU-4250] sanity-hsm test_15 failure: 'could not rebind file list' Created: 14/Nov/13  Updated: 14/Jan/16  Resolved: 11/Dec/13

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: James Nunez (Inactive) Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: HSM

Issue Links:
Duplicate
is duplicated by LU-7215 lustre-initialization-1 failed: ncli.... Resolved
Severity: 3
Rank (Obsolete): 11587

 Description   

Test results are at: https://maloo.whamcloud.com/test_sets/5338e98e-4c98-11e3-8ab0-52540035b04c

From the test log, a call to get_param fails and then errors reported from the copytool:

CMD: wtm-17vm3 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED'
CMD: wtm-17vm3 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED'
CMD: wtm-17vm3 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED'
CMD: wtm-17vm3 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED'
wwtm-17vm3: error: get_param: read('/proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions') failed: Input/output error
rebind list of files
CMD: wtm-17vm5 lhsmtool_posix --archive 2 --hsm-root /home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1		 --rebind /home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/tmp.30189 /mnt/lustre
wtm-17vm5: lhsmtool_posix[10247]: action=2 src=/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/tmp.30189 dst=(null) mount_point=/mnt/lustre
wtm-17vm5: lhsmtool_posix[10247]: rebind [0x200000401:0x10:0x0] to [0x200000400:0x2f:0x0]
wtm-17vm5: lhsmtool_posix[10247]: rebind [0x200000401:0x11:0x0] to [0x200000400:0x31:0x0]
wtm-17vm5: lhsmtool_posix[10247]: rebind [0x200000401:0x12:0x0] to [0x200000400:0x33:0x0]
wtm-17vm5: lhsmtool_posix[10247]: rebind [0x200000401:0x13:0x0] to [0x200000400:0x35:0x0]
wtm-17vm5: lhsmtool_posix[10247]: rebind [0x200000401:0x14:0x0] to [0x200000400:0x37:0x0]
wtm-17vm5: lhsmtool_posix[10247]: cannot rename '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0014/0000/0401/0000/0002/0000/0x200000401:0x14:0x0' to '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0037/0000/0400/0000/0002/0000/0x200000400:0x37:0x0': No such file or directory (2)
wtm-17vm5: lhsmtool_posix[10247]: 5 lines read from '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/tmp.30189', 4 rebind successful
wtm-17vm5: lhsmtool_posix[10247]: process finished, errs: 1 major, 0 minor, rc=-1 (Operation not permitted)
 sanity-hsm test_15: @@@@@@ FAIL: could not rebind file list 

Errors in the copy log are:

lhsmtool_posix[10198]: data archiving for '/mnt/lustre/.lustre/fid/0x200000401:0x12:0x0' to '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0012/0000/0401/0000/0002/0000/0x200000401:0x12:0x0_tmp' done
lhsmtool_posix[10198]: attr file for '/mnt/lustre/.lustre/fid/0x200000401:0x12:0x0' saved to archive '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0012/0000/0401/0000/0002/0000/0x200000401:0x12:0x0_tmp'
lhsmtool_posix[10198]: cannot copy xattr of '/mnt/lustre/.lustre/fid/0x200000401:0x12:0x0' to '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0012/0000/0401/0000/0002/0000/0x200000401:0x12:0x0_tmp': No such file or directory (2)
lhsmtool_posix[10198]: xattr file for '/mnt/lustre/.lustre/fid/0x200000401:0x12:0x0' saved to archive '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0012/0000/0401/0000/0002/0000/0x200000401:0x12:0x0_tmp'
lhsmtool_posix[10198]: cannot get FID of '[0x200000401:0x12:0x0]': No such file or directory (2)
lhsmtool_posix[10198]: Action completed, notifying coordinator cookie=0x5283b1ac, FID=[0x200000401:0x12:0x0], hp_flags=0 err=2
lhsmtool_posix[10198]: llapi_hsm_action_end() on '/mnt/lustre/.lustre/fid/0x200000401:0x12:0x0' ok (rc=0)
lhsmtool_posix[10199]: data archiving for '/mnt/lustre/.lustre/fid/0x200000401:0x13:0x0' to '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0013/0000/0401/0000/0002/0000/0x200000401:0x13:0x0_tmp' done
lhsmtool_posix[10199]: attr file for '/mnt/lustre/.lustre/fid/0x200000401:0x13:0x0' saved to archive '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0013/0000/0401/0000/0002/0000/0x200000401:0x13:0x0_tmp'
lhsmtool_posix[10199]: cannot copy xattr of '/mnt/lustre/.lustre/fid/0x200000401:0x13:0x0' to '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0013/0000/0401/0000/0002/0000/0x200000401:0x13:0x0_tmp': No such file or directory (2)
lhsmtool_posix[10199]: xattr file for '/mnt/lustre/.lustre/fid/0x200000401:0x13:0x0' saved to archive '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0013/0000/0401/0000/0002/0000/0x200000401:0x13:0x0_tmp'
lhsmtool_posix[10199]: cannot get FID of '[0x200000401:0x13:0x0]': No such file or directory (2)
lhsmtool_posix[10199]: Action completed, notifying coordinator cookie=0x5283b1ad, FID=[0x200000401:0x13:0x0], hp_flags=0 err=2
lhsmtool_posix[10199]: llapi_hsm_action_end() on '/mnt/lustre/.lustre/fid/0x200000401:0x13:0x0' ok (rc=0)
lhsmtool_posix[10200]: data archiving for '/mnt/lustre/.lustre/fid/0x200000401:0x14:0x0' to '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0014/0000/0401/0000/0002/0000/0x200000401:0x14:0x0_tmp' done
lhsmtool_posix[10200]: attr file for '/mnt/lustre/.lustre/fid/0x200000401:0x14:0x0' saved to archive '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0014/0000/0401/0000/0002/0000/0x200000401:0x14:0x0_tmp'
lhsmtool_posix[10200]: cannot copy xattr of '/mnt/lustre/.lustre/fid/0x200000401:0x14:0x0' to '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0014/0000/0401/0000/0002/0000/0x200000401:0x14:0x0_tmp': No such file or directory (2)
lhsmtool_posix[10200]: xattr file for '/mnt/lustre/.lustre/fid/0x200000401:0x14:0x0' saved to archive '/home/cgearing/.autotest/shared_dir/2013-11-12/223921-70307003924420/arc1/0014/0000/0401/0000/0002/0000/0x200000401:0x14:0x0_tmp'
lhsmtool_posix[10200]: cannot get FID of '[0x200000401:0x14:0x0]': No such file or directory (2)
lhsmtool_posix[10200]: Action completed, notifying coordinator cookie=0x5283b1ae, FID=[0x200000401:0x14:0x0], hp_flags=0 err=2
lhsmtool_posix[10200]: llapi_hsm_action_end() on '/mnt/lustre/.lustre/fid/0x200000401:0x14:0x0' ok (rc=0)

On the MDS console and dmesg:

09:08:27:Lustre: DEBUG MARKER: /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | egrep 'WAITING|STARTED'
09:08:27:LustreError: 12057:0:(mdt_xattr.c:134:mdt_getxattr_one()) getxattr failed: -2
09:08:27:LustreError: 12057:0:(mdt_xattr.c:134:mdt_getxattr_one()) getxattr failed: -2
09:08:27:Lustre: DEBUG MARKER: /usr/sbin/lctl mark  sanity-hsm test_15: @@@@@@ FAIL: could not rebind file list 
09:08:27:Lustre: DEBUG MARKER: sanity-hsm test_15: @@@@@@ FAIL: could not rebind file list
09:08:27:Lustre: DEBUG MARKER: /usr/sbin/lctl dk > /logdir/test_logs/2013-11-12/lustre-reviews-el6-x86_64--review--1_2_1__19416__-70307003924420-223918/sanity-hsm.test_15.debug_log.$(hostname -s).1384362493.log;
09:08:27:         dmesg > /logdir/test_logs/2013-11-12/lustre-reviews-el6-x86_64--r
09:08:27:LustreError: 12057:0:(mdt_xattr.c:134:mdt_getxattr_one()) getxattr failed: -2

Generated at Sat Feb 10 01:41:01 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.