Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4126

sanity-hsm test_15 failure: 'requests did not complete'

Details

    • Bug
    • Resolution: Duplicate
    • Minor
    • None
    • Lustre 2.5.0
    • 3
    • 11135

    Description

      The test results are at: https://maloo.whamcloud.com/test_sessions/af9dfc14-3834-11e3-8bc4-52540035b04c

      From the test_log:

      == sanity-hsm test 15: Rebind a list of files == 07:44:55 (1382107495)
      pdsh@c15: c08: ssh exited with exit code 1
      Purging archive on c08
      Starting copytool agt1 on c08
      2+0 records in
      2+0 records out
      2097152 bytes (2.1 MB) copied, 0.315725 s, 6.6 MB/s
      2+0 records in
      2+0 records out
      2097152 bytes (2.1 MB) copied, 0.310107 s, 6.8 MB/s
      2+0 records in
      2+0 records out
      2097152 bytes (2.1 MB) copied, 0.305355 s, 6.9 MB/s
      2+0 records in
      2+0 records out
      2097152 bytes (2.1 MB) copied, 0.3105 s, 6.8 MB/s
      2+0 records in
      2+0 records out
      2097152 bytes (2.1 MB) copied, 0.311318 s, 6.7 MB/s
      Changed after 0s: from '' to 'lrh=[type=10680000 len=136 idx=3/28] fid=[0x20000040b:0x74:0x0] dfid=[0x20000040b:0x74:0x0] compound/cookie=0x5260275d/0x5260275d action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/106] fid=[0x200000408:0x230:0x0] dfid=[0x200000408:0x230:0x0] compound/cookie=0x5260277d/0x52602777 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/109] fid=[0x200000408:0x231:0x0] dfid=[0x200000408:0x231:0x0] compound/cookie=0x5260277e/0x52602778 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/110] fid=[0x200000408:0x232:0x0] dfid=[0x200000408:0x232:0x0] compound/cookie=0x5260277f/0x52602779 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
      lrh=[type=10680000 len=136 idx=3/111] fid=[0x200000408:0x233:0x0] dfid=[0x200000408:0x233:0x0] compound/cookie=0x52602780/0x5260277a action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]'
      Waiting 300 secs for update
      Changed after 4s: from 'lrh=[type=10680000 len=136 idx=3/28] fid=[0x20000040b:0x74:0x0] dfid=[0x20000040b:0x74:0x0] compound/cookie=0x5260275d/0x5260275d action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/106] fid=[0x200000408:0x230:0x0] dfid=[0x200000408:0x230:0x0] compound/cookie=0x5260277d/0x52602777 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/109] fid=[0x200000408:0x231:0x0] dfid=[0x200000408:0x231:0x0] compound/cookie=0x5260277e/0x52602778 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/110] fid=[0x200000408:0x232:0x0] dfid=[0x200000408:0x232:0x0] compound/cookie=0x5260277f/0x52602779 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]
      lrh=[type=10680000 len=136 idx=3/111] fid=[0x200000408:0x233:0x0] dfid=[0x200000408:0x233:0x0] compound/cookie=0x52602780/0x5260277a action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=WAITING data=[]' to 'lrh=[type=10680000 len=136 idx=3/28] fid=[0x20000040b:0x74:0x0] dfid=[0x20000040b:0x74:0x0] compound/cookie=0x5260275d/0x5260275d action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/113] fid=[0x200000408:0x233:0x0] dfid=[0x200000408:0x233:0x0] compound/cookie=0x52602780/0x5260277a action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/115] fid=[0x200000408:0x232:0x0] dfid=[0x200000408:0x232:0x0] compound/cookie=0x5260277f/0x52602779 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]'
      Changed after 7s: from 'lrh=[type=10680000 len=136 idx=3/28] fid=[0x20000040b:0x74:0x0] dfid=[0x20000040b:0x74:0x0] compound/cookie=0x5260275d/0x5260275d action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/113] fid=[0x200000408:0x233:0x0] dfid=[0x200000408:0x233:0x0] compound/cookie=0x52602780/0x5260277a action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/115] fid=[0x200000408:0x232:0x0] dfid=[0x200000408:0x232:0x0] compound/cookie=0x5260277f/0x52602779 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]' to 'lrh=[type=10680000 len=136 idx=3/28] fid=[0x20000040b:0x74:0x0] dfid=[0x20000040b:0x74:0x0] compound/cookie=0x5260275d/0x5260275d action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/115] fid=[0x200000408:0x232:0x0] dfid=[0x200000408:0x232:0x0] compound/cookie=0x5260277f/0x52602779 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]'
      Changed after 8s: from 'lrh=[type=10680000 len=136 idx=3/28] fid=[0x20000040b:0x74:0x0] dfid=[0x20000040b:0x74:0x0] compound/cookie=0x5260275d/0x5260275d action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]
      lrh=[type=10680000 len=136 idx=3/115] fid=[0x200000408:0x232:0x0] dfid=[0x200000408:0x232:0x0] compound/cookie=0x5260277f/0x52602779 action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]' to 'lrh=[type=10680000 len=136 idx=3/28] fid=[0x20000040b:0x74:0x0] dfid=[0x20000040b:0x74:0x0] compound/cookie=0x5260275d/0x5260275d action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]'
      Waiting 290 secs for update
      Waiting 280 secs for update
      Waiting 270 secs for update
      Waiting 260 secs for update
      Waiting 250 secs for update
      Waiting 240 secs for update
      Waiting 230 secs for update
      Waiting 220 secs for update
      Waiting 210 secs for update
      Waiting 200 secs for update
      Waiting 190 secs for update
      Waiting 180 secs for update
      Waiting 170 secs for update
      Waiting 160 secs for update
      Waiting 150 secs for update
      Waiting 140 secs for update
      Waiting 130 secs for update
      Waiting 120 secs for update
      Waiting 110 secs for update
      Waiting 100 secs for update
      Waiting 90 secs for update
      Waiting 80 secs for update
      Waiting 70 secs for update
      Waiting 60 secs for update
      Waiting 50 secs for update
      Waiting 40 secs for update
      Waiting 30 secs for update
      Waiting 20 secs for update
      Waiting 10 secs for update
      Update not seen after 300s: wanted '' got 'lrh=[type=10680000 len=136 idx=3/28] fid=[0x20000040b:0x74:0x0] dfid=[0x20000040b:0x74:0x0] compound/cookie=0x5260275d/0x5260275d action=ARCHIVE archive#=3 flags=0x0 extent=0x0-0xffffffffffffffff gid=0x0 datalen=0 status=STARTED data=[]'
       sanity-hsm test_15: @@@@@@ FAIL: requests did not complete 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:4264:error_noexit()
        = /usr/lib64/lustre/tests/test-framework.sh:4291:error()
        = /usr/lib64/lustre/tests/sanity-hsm.sh:500:wait_all_done()
        = /usr/lib64/lustre/tests/sanity-hsm.sh:1161:test_15()
        = /usr/lib64/lustre/tests/test-framework.sh:4530:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:4563:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:4433:run_test()
        = /usr/lib64/lustre/tests/sanity-hsm.sh:1195:main()
      Dumping lctl log to /tmp/test_logs/2013-10-18/074316/sanity-hsm.test_15.*.1382107848.log
      Copytool is stopped on c08
      

      The copytool_log for this test is below, but does not contain the last fid (0x20000040b:0x74:0x0) listed in the test_log.

      lhsmtool_posix[7609]: action=0 src=(null) dst=(null) mount_point=/lustre/scratch
      lhsmtool_posix[7610]: waiting for message from kernel
      lhsmtool_posix[7610]: copytool fs=scratch archive#=3 item_count=1
      lhsmtool_posix[7610]: waiting for message from kernel
      lhsmtool_posix[7611]: '[0x200000408:0x22f:0x0]' action ARCHIVE reclen 72, cookie=0x52602776
      lhsmtool_posix[7611]: processing file 'd0.sanity-hsm/d15/f.sanity-hsm.15.1'
      lhsmtool_posix[7611]: archiving '/lustre/scratch/.lustre/fid/0x200000408:0x22f:0x0' to '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp'
      lhsmtool_posix[7611]: saving stripe info of '/lustre/scratch/.lustre/fid/0x200000408:0x22f:0x0' in /lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp.lov
      lhsmtool_posix[7611]: going to copy data from '/lustre/scratch/.lustre/fid/0x200000408:0x22f:0x0' to '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp'
      lhsmtool_posix[7611]: Going to copy 2097152 bytes /lustre/scratch/.lustre/fid/0x200000408:0x22f:0x0 -> /lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp
      
      lhsmtool_posix[7610]: copytool fs=scratch archive#=3 item_count=1
      lhsmtool_posix[7610]: waiting for message from kernel
      lhsmtool_posix[7612]: '[0x200000408:0x230:0x0]' action ARCHIVE reclen 72, cookie=0x52602777
      lhsmtool_posix[7612]: processing file 'd0.sanity-hsm/d15/f.sanity-hsm.15.2'
      lhsmtool_posix[7612]: archiving '/lustre/scratch/.lustre/fid/0x200000408:0x230:0x0' to '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp'
      lhsmtool_posix[7612]: saving stripe info of '/lustre/scratch/.lustre/fid/0x200000408:0x230:0x0' in /lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp.lov
      lhsmtool_posix[7612]: going to copy data from '/lustre/scratch/.lustre/fid/0x200000408:0x230:0x0' to '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp'
      lhsmtool_posix[7612]: Going to copy 2097152 bytes /lustre/scratch/.lustre/fid/0x200000408:0x230:0x0 -> /lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp
      
      lhsmtool_posix[7611]: data archiving for '/lustre/scratch/.lustre/fid/0x200000408:0x22f:0x0' to '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp' done
      lhsmtool_posix[7611]: attr file for '/lustre/scratch/.lustre/fid/0x200000408:0x22f:0x0' saved to archive '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp'
      lhsmtool_posix[7611]: fsetxattr of 'trusted.hsm' on '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7611]: fsetxattr of 'trusted.link' on '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7611]: fsetxattr of 'trusted.lov' on '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7611]: fsetxattr of 'trusted.lma' on '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7611]: fsetxattr of 'lustre.lov' on '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7611]: xattr file for '/lustre/scratch/.lustre/fid/0x200000408:0x22f:0x0' saved to archive '/lustre/archive/022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0_tmp'
      lhsmtool_posix[7611]: symlink '/lustre/archive/shadow/d0.sanity-hsm/d15/f.sanity-hsm.15.1' to '../../../022f/0000/0408/0000/0002/0000/0x200000408:0x22f:0x0' done
      lhsmtool_posix[7611]: Action completed, notifying coordinator cookie=0x52602776, FID=[0x200000408:0x22f:0x0], hp_flags=0 err=0
      lhsmtool_posix[7611]: llapi_hsm_action_end() on '/lustre/scratch/.lustre/fid/0x200000408:0x22f:0x0' ok (rc=0)
      lhsmtool_posix[7610]: copytool fs=scratch archive#=3 item_count=1
      lhsmtool_posix[7610]: waiting for message from kernel
      lhsmtool_posix[7613]: '[0x200000408:0x231:0x0]' action ARCHIVE reclen 72, cookie=0x52602778
      lhsmtool_posix[7613]: processing file 'd0.sanity-hsm/d15/f.sanity-hsm.15.3'
      lhsmtool_posix[7613]: archiving '/lustre/scratch/.lustre/fid/0x200000408:0x231:0x0' to '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp'
      lhsmtool_posix[7613]: saving stripe info of '/lustre/scratch/.lustre/fid/0x200000408:0x231:0x0' in /lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp.lov
      lhsmtool_posix[7613]: going to copy data from '/lustre/scratch/.lustre/fid/0x200000408:0x231:0x0' to '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp'
      lhsmtool_posix[7613]: Going to copy 2097152 bytes /lustre/scratch/.lustre/fid/0x200000408:0x231:0x0 -> /lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp
      
      lhsmtool_posix[7613]: bandwith control: excess=3.145728E+06 sleep for 3000000us
      lhsmtool_posix[7612]: data archiving for '/lustre/scratch/.lustre/fid/0x200000408:0x230:0x0' to '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp' done
      lhsmtool_posix[7612]: attr file for '/lustre/scratch/.lustre/fid/0x200000408:0x230:0x0' saved to archive '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp'
      lhsmtool_posix[7612]: fsetxattr of 'trusted.hsm' on '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7612]: fsetxattr of 'trusted.link' on '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7612]: fsetxattr of 'trusted.lov' on '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7612]: fsetxattr of 'trusted.lma' on '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7612]: fsetxattr of 'lustre.lov' on '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7612]: xattr file for '/lustre/scratch/.lustre/fid/0x200000408:0x230:0x0' saved to archive '/lustre/archive/0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0_tmp'
      lhsmtool_posix[7612]: symlink '/lustre/archive/shadow/d0.sanity-hsm/d15/f.sanity-hsm.15.2' to '../../../0230/0000/0408/0000/0002/0000/0x200000408:0x230:0x0' done
      lhsmtool_posix[7612]: Action completed, notifying coordinator cookie=0x52602777, FID=[0x200000408:0x230:0x0], hp_flags=0 err=0
      lhsmtool_posix[7612]: llapi_hsm_action_end() on '/lustre/scratch/.lustre/fid/0x200000408:0x230:0x0' ok (rc=0)
      lhsmtool_posix[7610]: copytool fs=scratch archive#=3 item_count=1
      lhsmtool_posix[7610]: waiting for message from kernel
      lhsmtool_posix[7614]: '[0x200000408:0x233:0x0]' action ARCHIVE reclen 72, cookie=0x5260277a
      lhsmtool_posix[7614]: processing file 'd0.sanity-hsm/d15/f.sanity-hsm.15.5'
      lhsmtool_posix[7614]: archiving '/lustre/scratch/.lustre/fid/0x200000408:0x233:0x0' to '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp'
      lhsmtool_posix[7614]: saving stripe info of '/lustre/scratch/.lustre/fid/0x200000408:0x233:0x0' in /lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp.lov
      lhsmtool_posix[7614]: going to copy data from '/lustre/scratch/.lustre/fid/0x200000408:0x233:0x0' to '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp'
      lhsmtool_posix[7614]: Going to copy 2097152 bytes /lustre/scratch/.lustre/fid/0x200000408:0x233:0x0 -> /lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp
      
      lhsmtool_posix[7613]: data archiving for '/lustre/scratch/.lustre/fid/0x200000408:0x231:0x0' to '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp' done
      lhsmtool_posix[7613]: attr file for '/lustre/scratch/.lustre/fid/0x200000408:0x231:0x0' saved to archive '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp'
      lhsmtool_posix[7613]: fsetxattr of 'trusted.hsm' on '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7613]: fsetxattr of 'trusted.link' on '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7613]: fsetxattr of 'trusted.lov' on '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7613]: fsetxattr of 'trusted.lma' on '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7613]: fsetxattr of 'lustre.lov' on '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7613]: xattr file for '/lustre/scratch/.lustre/fid/0x200000408:0x231:0x0' saved to archive '/lustre/archive/0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0_tmp'
      lhsmtool_posix[7613]: symlink '/lustre/archive/shadow/d0.sanity-hsm/d15/f.sanity-hsm.15.3' to '../../../0231/0000/0408/0000/0002/0000/0x200000408:0x231:0x0' done
      lhsmtool_posix[7613]: Action completed, notifying coordinator cookie=0x52602778, FID=[0x200000408:0x231:0x0], hp_flags=0 err=0
      lhsmtool_posix[7613]: llapi_hsm_action_end() on '/lustre/scratch/.lustre/fid/0x200000408:0x231:0x0' ok (rc=0)
      lhsmtool_posix[7610]: copytool fs=scratch archive#=3 item_count=1
      lhsmtool_posix[7610]: waiting for message from kernel
      lhsmtool_posix[7615]: '[0x200000408:0x232:0x0]' action ARCHIVE reclen 72, cookie=0x52602779
      lhsmtool_posix[7615]: processing file 'd0.sanity-hsm/d15/f.sanity-hsm.15.4'
      lhsmtool_posix[7615]: archiving '/lustre/scratch/.lustre/fid/0x200000408:0x232:0x0' to '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp'
      lhsmtool_posix[7615]: saving stripe info of '/lustre/scratch/.lustre/fid/0x200000408:0x232:0x0' in /lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp.lov
      lhsmtool_posix[7615]: going to copy data from '/lustre/scratch/.lustre/fid/0x200000408:0x232:0x0' to '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp'
      lhsmtool_posix[7615]: Going to copy 2097152 bytes /lustre/scratch/.lustre/fid/0x200000408:0x232:0x0 -> /lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp
      
      lhsmtool_posix[7614]: data archiving for '/lustre/scratch/.lustre/fid/0x200000408:0x233:0x0' to '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp' done
      lhsmtool_posix[7614]: attr file for '/lustre/scratch/.lustre/fid/0x200000408:0x233:0x0' saved to archive '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp'
      lhsmtool_posix[7614]: fsetxattr of 'trusted.hsm' on '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7614]: fsetxattr of 'trusted.link' on '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7614]: fsetxattr of 'trusted.lov' on '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7614]: fsetxattr of 'trusted.lma' on '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7614]: fsetxattr of 'lustre.lov' on '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7614]: xattr file for '/lustre/scratch/.lustre/fid/0x200000408:0x233:0x0' saved to archive '/lustre/archive/0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0_tmp'
      lhsmtool_posix[7614]: symlink '/lustre/archive/shadow/d0.sanity-hsm/d15/f.sanity-hsm.15.5' to '../../../0233/0000/0408/0000/0002/0000/0x200000408:0x233:0x0' done
      lhsmtool_posix[7614]: Action completed, notifying coordinator cookie=0x5260277a, FID=[0x200000408:0x233:0x0], hp_flags=0 err=0
      lhsmtool_posix[7614]: llapi_hsm_action_end() on '/lustre/scratch/.lustre/fid/0x200000408:0x233:0x0' ok (rc=0)
      lhsmtool_posix[7615]: data archiving for '/lustre/scratch/.lustre/fid/0x200000408:0x232:0x0' to '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp' done
      lhsmtool_posix[7615]: attr file for '/lustre/scratch/.lustre/fid/0x200000408:0x232:0x0' saved to archive '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp'
      lhsmtool_posix[7615]: fsetxattr of 'trusted.hsm' on '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7615]: fsetxattr of 'trusted.link' on '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7615]: fsetxattr of 'trusted.lov' on '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7615]: fsetxattr of 'trusted.lma' on '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7615]: fsetxattr of 'lustre.lov' on '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp' rc=0 (Success)
      lhsmtool_posix[7615]: xattr file for '/lustre/scratch/.lustre/fid/0x200000408:0x232:0x0' saved to archive '/lustre/archive/0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0_tmp'
      lhsmtool_posix[7615]: symlink '/lustre/archive/shadow/d0.sanity-hsm/d15/f.sanity-hsm.15.4' to '../../../0232/0000/0408/0000/0002/0000/0x200000408:0x232:0x0' done
      lhsmtool_posix[7615]: Action completed, notifying coordinator cookie=0x52602779, FID=[0x200000408:0x232:0x0], hp_flags=0 err=0
      lhsmtool_posix[7615]: llapi_hsm_action_end() on '/lustre/scratch/.lustre/fid/0x200000408:0x232:0x0' ok (rc=0)
      exiting: Interrupt
      

      Attachments

        Issue Links

          Activity

            [LU-4126] sanity-hsm test_15 failure: 'requests did not complete'

            No more related failures for master builds reported since Nov 18th. So as expected the 2 changes for LU-4093 made it.

            bfaccini Bruno Faccini (Inactive) added a comment - No more related failures for master builds reported since Nov 18th. So as expected the 2 changes for LU-4093 made it.

            Seems that my previous assumption was true finally and the number of sanity-hsm failures linked to this ticket has significantly reduced since 1st patch for LU-4093 has landed, let's wait a bit more now that 2nd/correction patch for LU-4093 also just landed.

            bfaccini Bruno Faccini (Inactive) added a comment - Seems that my previous assumption was true finally and the number of sanity-hsm failures linked to this ticket has significantly reduced since 1st patch for LU-4093 has landed, let's wait a bit more now that 2nd/correction patch for LU-4093 also just landed.

            Patch/change #8157 for LU-4093, if successful should avoid the error/problem for this ticket since no other/old zombie request than the one for the current sub-test should survive on copytool_cleanup between sub-tests.

            If not sufficient, we may need to switch to a completion check for each/specific requests from this test, instead of global/wait_all_done().

            bfaccini Bruno Faccini (Inactive) added a comment - Patch/change #8157 for LU-4093 , if successful should avoid the error/problem for this ticket since no other/old zombie request than the one for the current sub-test should survive on copytool_cleanup between sub-tests. If not sufficient, we may need to switch to a completion check for each/specific requests from this test, instead of global/wait_all_done().

            I agree with Jinshan, this FID causing the time-out looks like an old+orphan one, like I describe in LU-4093.

            So yes, a possible solution could be to only check the files/FIDs we create and use for this test, or like I suggest in LU-4093, to ensure no orphan or zombie request is left from previous tests.

            bfaccini Bruno Faccini (Inactive) added a comment - I agree with Jinshan, this FID causing the time-out looks like an old+orphan one, like I describe in LU-4093 . So yes, a possible solution could be to only check the files/FIDs we create and use for this test, or like I suggest in LU-4093 , to ensure no orphan or zombie request is left from previous tests.
            jay Jinshan Xiong (Inactive) added a comment - - edited

            the FID in agent_actions didn't match the FID in copytool log. In agent_action it was [0x20000040b:0x74:0x0], but actually it should be [0x200000408:0x22f:0x0].

            Probably the archive for the first file had already finished and [0x20000040b:0x74:0x0] was leftover information from previous tests.

            In that case, we should only check the status of files we created in the test cases.

            This is not a blocker.

            jay Jinshan Xiong (Inactive) added a comment - - edited the FID in agent_actions didn't match the FID in copytool log. In agent_action it was [0x20000040b:0x74:0x0] , but actually it should be [0x200000408:0x22f:0x0] . Probably the archive for the first file had already finished and [0x20000040b:0x74:0x0] was leftover information from previous tests. In that case, we should only check the status of files we created in the test cases. This is not a blocker.

            People

              bfaccini Bruno Faccini (Inactive)
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: