[LU-11709] sanity-hsm test 260c fails with 'request on 0x2000013a0:0x2ba:0x0 is not SUCCEED on mds1' Created: 27/Nov/18 Updated: 29/Nov/18 Resolved: 29/Nov/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | DNE | ||
| Environment: |
DNE |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
sanity-hsm test_260c fails to successfully archive a file and then starts the copytool. We see this test fail in review-dne-part-2 and review-dne-zfs-part-2 test groups. Looking at logs for this failure at https://testing.whamcloud.com/test_sets/89b3068c-f258-11e8-b67f-52540065bddc , the last thing we see in the client test_log is waiting for the archive wait state to go from ARCHVIE to SUCCESS Starting copytool agt1 on onyx-38vm2
CMD: onyx-38vm2 lhsmtool_posix --daemon --hsm-root "/tmp/arc1/sanity-hsm.test_260c/" "/mnt/lustre2" < /dev/null > "/autotest/autotest/2018-11-27/lustre-reviews-el7_5-x86_64--review-dne-zfs-part-2--1_10_1__60277___2ab2a1f1-8a91-44cd-aca5-35ecd1cac72c/sanity-hsm.test_260c.copytool_log.onyx-38vm2.log" 2>&1
CMD: onyx-38vm2 mkdir -p /tmp/arc1/sanity-hsm.test_260c/
Starting copytool agt1 on onyx-38vm2
CMD: onyx-38vm2 lhsmtool_posix --daemon --hsm-root "/tmp/arc1/sanity-hsm.test_260c/" --archive 2 "/mnt/lustre2" < /dev/null > "/autotest/autotest/2018-11-27/lustre-reviews-el7_5-x86_64--review-dne-zfs-part-2--1_10_1__60277___2ab2a1f1-8a91-44cd-aca5-35ecd1cac72c/sanity-hsm.test_260c.copytool2_log.onyx-38vm2.log" 2>&1
CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
Waiting 200 secs for update
…
CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
CMD: onyx-38vm4 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.actions | awk '/'0x2000013a0:0x2ba:0x0'.*action='ARCHIVE'/ {print \$13}' | cut -f2 -d=
Update not seen after 200s: wanted 'SUCCEED' got 'WAITING'
sanity-hsm test_260c: @@@@@@ FAIL: request on 0x2000013a0:0x2ba:0x0 is not SUCCEED on mds1
There are no obvious issues found in the console nor in dmesg logs. sanity-hsm test_206c just got updated with patch https://review.whamcloud.com/33478 that landed on November 26, 2018. This failure started on November 27, 2018. Logs for other failures are at: |
| Comments |
| Comment by James Nunez (Inactive) [ 27/Nov/18 ] |
|
Quentin - Would you please take a look at these failures? Either your patch brought out an existing issue with HSM or does not work well with DNE. |
| Comment by James A Simmons [ 27/Nov/18 ] |
|
Try the patch https://review.whamcloud.com/#/c/33649. Yep it was an out of order landing of HSM patches. |
| Comment by Quentin Bouget [ 28/Nov/18 ] |
|
My bad, I knew my patch for James S. is right, https://review.whamcloud.com/#/c/33649 should solve this. |
| Comment by Andreas Dilger [ 29/Nov/18 ] |
|
The patch from |