Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
None
-
None
-
3
-
9223372036854775807
Description
sanity-hsm test_31c fails to archive the newly created file:
Update not seen after 200s: wanted 'SUCCEED' got 'STARTED' sanity-hsm test_31c: @@@@@@ FAIL: request on 0x200000402:0x38:0x0 is not SUCCEED on mds1
and then we can shutdown the copytool:
copytools still running on trevis-51vm6 CMD: trevis-51vm6 pgrep -x lhsmtool_posix trevis-51vm6: 6036 copytools still running on trevis-51vm6 CMD: trevis-51vm6 echo 1 >/proc/sys/kernel/sysrq ; echo t >/proc/sysrq-trigger copytools failed to stop in 200s sanity-hsm test_31c: @@@@@@ FAIL: copytools failed to stop
From the copytool logs, the archive was progressing and it doesn’t look like it exceeded 200 seconds unless the copytool was hung:
1505294662.538461 lhsmtool_posix[6038]: '[0x200000402:0x38:0x0]' action ARCHIVE reclen 72, cookie=0x59b8f942 1505294662.539849 lhsmtool_posix[6038]: processing file 'd31c.sanity-hsm/f31c.sanity-hsm' 1505294662.555950 lhsmtool_posix[6038]: archiving '/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0' to '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' 1505294662.556901 lhsmtool_posix[6038]: saving stripe info of '/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0' in /tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp.lov 1505294662.558302 lhsmtool_posix[6038]: start copy of 34603008 bytes from '/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0' to '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' 1505294692.836074 lhsmtool_posix[6038]: %90 1505294692.838435 lhsmtool_posix[6038]: bandwith control: 1048576B/s excess=1048576 sleep for 1.000000000s 1505294695.842294 lhsmtool_posix[6038]: copied 34603008 bytes in 33.285317 seconds 1505294695.896192 lhsmtool_posix[6038]: data archiving for '/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0' to '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' done 1505294695.896379 lhsmtool_posix[6038]: attr file for '/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0' saved to archive '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' 1505294695.897327 lhsmtool_posix[6038]: fsetxattr of 'trusted.hsm' on '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' rc=0 (Success) 1505294695.897360 lhsmtool_posix[6038]: fsetxattr of 'trusted.version' on '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' rc=0 (Success) 1505294695.897402 lhsmtool_posix[6038]: fsetxattr of 'trusted.link' on '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' rc=0 (Success) 1505294695.897432 lhsmtool_posix[6038]: fsetxattr of 'trusted.lov' on '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' rc=0 (Success) 1505294695.897476 lhsmtool_posix[6038]: fsetxattr of 'trusted.lma' on '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' rc=0 (Success) 1505294695.897515 lhsmtool_posix[6038]: fsetxattr of 'lustre.lov' on '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' rc=-1 (Operation not supported) 1505294695.897531 lhsmtool_posix[6038]: xattr file for '/mnt/lustre2/.lustre/fid/0x200000402:0x38:0x0' saved to archive '/tmp/arc1/shsm/0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0_tmp' 1505294695.898946 lhsmtool_posix[6038]: symlink '/tmp/arc1/shsm/shadow/d31c.sanity-hsm/f31c.sanity-hsm' to '../../0038/0000/0402/0000/0002/0000/0x200000402:0x38:0x0' done exiting: Terminated
So far, we’ve only see this ‘copytool failed to stop’ error for one patch test session. Logs for this failure are at
https://testing.hpdd.intel.com/test_sets/3e2f0620-9872-11e7-b775-5254006e85c2
We’ve seen the fail to archive a file several times for this test and the failure has at least once been attributed to LU-7988.
Attachments
Issue Links
- duplicates
-
LU-9845 ost-pools test_22 hangs with ‘WARNING: Pool 'lustre-mdt1' has encountered an uncorrectable I/O failure and has been suspended.’
- Resolved