[LU-4275] sanity-hsm test_8 and many more failed: 36/91 passed Created: 19/Nov/13 Updated: 03/Jun/14 Resolved: 10/Dec/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Maloo | Assignee: | Bruno Faccini (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 11743 | ||||||||||||
| Description |
|
This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com> This issue relates to the following test suite run: The sub-test test_8 failed with the following error:
In fact, Maloo is reporting only 36/91 tests passed. These failures are being attributed to Info required for matching: sanity-hsm 8 |
| Comments |
| Comment by Andreas Dilger [ 19/Nov/13 ] |
|
This caused 16 of 50 recent sanity-hsm test failures, so it is pretty important to fix. Since it is causing 55 separate tests to fail, it cannot be fixed by simply skipping a single failing test. |
| Comment by Andreas Dilger [ 22/Nov/13 ] |
|
Still causing 16/40 of the sanity-hsm test failures. It looks like it ALWAYS and ONLY fails on superfat-intel-1vm {1,5}(most recent pass on 2013-11-15, https://maloo.whamcloud.com/test_sets/e1d47c4e-4e19-11e3-a167-52540035b04c). Without having looked into it, I'd guess there is something wrong with configuring the NFS "archive" for these nodes? If the problem cannot be fixed quickly can these nodes be removed from the test queue, or configured so they only run e.g. b2_1 tests that do not need sanity-hsm? |
| Comment by Andreas Dilger [ 23/Nov/13 ] |
|
Chris, Joshua, Mike, Please remove these nodes from the normal test rotation, since they are only causing tests to fail and need to be resubmitted. They could be left for developers to reserve, or (if possible) set to run b2_1-b2_4 tests only. |
| Comment by Bruno Faccini (Inactive) [ 25/Nov/13 ] |
|
For each failing sub-tests, the copytool log looks like following : lhsmtool_posix[29866]: action=0 src=(null) dst=(null) mount_point=/mnt/lustre lhsmtool_posix[29867]: waiting for message from kernel lhsmtool_posix[29867]: copytool fs=lustre archive#=1 item_count=1 lhsmtool_posix[29867]: waiting for message from kernel lhsmtool_posix[30544]: '[0x200000401:0x2:0x0]' action ARCHIVE reclen 72, cookie=0x528a6d6b lhsmtool_posix[30544]: processing file 'd0.sanity-hsm/d8/f.sanity-hsm.8' lhsmtool_posix[30544]: archiving '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0' to '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' lhsmtool_posix[30544]: saving stripe info of '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0' in /home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp.lov lhsmtool_posix[30544]: going to copy data from '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0' to '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' lhsmtool_posix[30544]: data archiving for '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0' to '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' done lhsmtool_posix[30544]: cannot set attributes of '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0': Operation not permitted (1) lhsmtool_posix[30544]: cannot copy attr of '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0' to '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp': Operation not permitted (1) lhsmtool_posix[30544]: attr file for '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0' saved to archive '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' lhsmtool_posix[30544]: fsetxattr of 'trusted.hsm' on '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' rc=-1 (Operation not supported) lhsmtool_posix[30544]: fsetxattr of 'trusted.link' on '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' rc=-1 (Operation not supported) lhsmtool_posix[30544]: fsetxattr of 'trusted.lov' on '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' rc=-1 (Operation not supported) lhsmtool_posix[30544]: fsetxattr of 'trusted.lma' on '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' rc=-1 (Operation not supported) lhsmtool_posix[30544]: fsetxattr of 'lustre.lov' on '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' rc=-1 (Operation not supported) lhsmtool_posix[30544]: xattr file for '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0' saved to archive '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0_tmp' lhsmtool_posix[30544]: symlink '/home/chris/.autotest/shared_dir/2013-11-18/031819-70364245912640/arc1/shadow/d0.sanity-hsm/d8/f.sanity-hsm.8' to '../../../0002/0000/0401/0000/0002/0000/0x200000401:0x2:0x0' done lhsmtool_posix[30544]: Action completed, notifying coordinator cookie=0x528a6d6b, FID=[0x200000401:0x2:0x0], hp_flags=0 err=1 lhsmtool_posix[30544]: llapi_hsm_action_end() on '/mnt/lustre/.lustre/fid/0x200000401:0x2:0x0' ok (rc=0) exiting: Interrupt This means that the FAILED status returned for the sub-tests HSM actions comes from errors during files operations on the hsm-root side/filesystem. |
| Comment by Bruno Faccini (Inactive) [ 02/Dec/13 ] |
|
There have been more sanity-hsm failures linked to this ticket and again/always during auto-tests runs on superfat-intel-1vm* only, and still due to the same EPERM error during fchmod()/fchown()/futimes() operations from copytool on NFS-mounted hsm-root filesystem. |
| Comment by Bruno Faccini (Inactive) [ 10/Dec/13 ] |
|
Duplicated by TEI-1208. |