[LU-12870] sanity-hsm test 9A fails with “uuid Dumping not found in agent list on mds1” Created: 16/Oct/19 Updated: 22/Oct/20 Resolved: 24/Sep/20 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.13.0, Lustre 2.12.3, Lustre 2.12.4 |
| Fix Version/s: | Lustre 2.14.0, Lustre 2.12.6 |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | Minh Diep |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | ubuntu, ubuntu16, ubuntu18 | ||
| Environment: |
Ubuntu |
||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
sanity-hsm test_9A fails with multiple errors for Ubuntu client testing . Looking at results starting 01 APRIL 2019, this test fails 100% of the time for Ubuntu 16.04 and ~73% of the time for Ubuntu 18.04. For master, this test fails 100% of the time since April. For b2_12, it looks like something bad landed on or before 02 JULY 2019 with Lustre version 2.12.2.69 because we have 100% failure of this test from this date until today; failures start with test session https://testing.whamcloud.com/test_sets/86cbddd4-9e26-11e9-8fc1-52540065bddc. When sanity-hsm test 9A fails, we see about 50 or more other sanity-hsm tests fail and, in all cases, eventually a later test will time out; see https://testing.whamcloud.com/test_sets/c3676bb2-eb26-11e9-b62b-52540065bddc or https://testing.whamcloud.com/test_sets/a8bded48-5db6-11e9-92fe-52540065bddc . Looking at the suite_log for https://testing.whamcloud.com/test_sets/c3676bb2-eb26-11e9-b62b-52540065bddc, we see trevis-63vm10: trevis-63vm10.trevis.whamcloud.com: executing libtool execute ps -C lhsmtool_posix -o args=
trevis-63vm10: rpc.sh: line 21: libtool: command not found
sanity-hsm test_9A: @@@@@@ FAIL: Found no Agent or with no mount-point parameter
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:5864:error()
= /usr/lib64/lustre/tests/sanity-hsm.sh:859:get_agent_uuid()
= /usr/lib64/lustre/tests/sanity-hsm.sh:1177:test_9A()
= /usr/lib64/lustre/tests/test-framework.sh:6166:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:6205:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:6051:run_test()
= /usr/lib64/lustre/tests/sanity-hsm.sh:1188:main()
CMD: trevis-63vm10,trevis-63vm11,trevis-63vm12,trevis-63vm9.trevis.whamcloud.com /usr/sbin/lctl dk > /autotest/autotest2/2019-10-08/lustre-b2_12-el7_6-x86_64-vs-lustre-b2_12-ubuntu1804-x86_64--full--1_9__52___8b37fabf-7e63-43bb-bb1c-9ed34b31d532/sanity-hsm.test_9A.debug_log.\$(hostname -s).1570578420.log;
dmesg > /autotest/autotest2/2019-10-08/lustre-b2_12-el7_6-x86_64-vs-lustre-b2_12-ubuntu1804-x86_64--full--1_9__52___8b37fabf-7e63-43bb-bb1c-9ed34b31d532/sanity-hsm.test_9A.dmesg.\$(hostname -s).1570578420.log
CMD: trevis-63vm10,trevis-63vm11,trevis-63vm12,trevis-63vm9.trevis.whamcloud.com lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
CMD: trevis-63vm12 /usr/sbin/lctl get_param -n mdt.lustre-MDT0000.hsm.agents | grep Dumping
sanity-hsm test_9A: @@@@@@ FAIL: uuid Dumping not found in agent list on mds1
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:5864:error()
= /usr/lib64/lustre/tests/sanity-hsm.sh:819:check_agent_registered_by_mdt()
= /usr/lib64/lustre/tests/sanity-hsm.sh:839:check_agent_registered()
= /usr/lib64/lustre/tests/sanity-hsm.sh:1178:test_9A()
= /usr/lib64/lustre/tests/test-framework.sh:6166:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:6205:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:6051:run_test()
= /usr/lib64/lustre/tests/sanity-hsm.sh:1188:main()
Dumping lctl log to /autotest/autotest2/2019-10-08/lustre-b2_12-el7_6-x86_64-vs-lustre-b2_12-ubuntu1804-x86_64--full--1_9__52___8b37fabf-7e63-43bb-bb1c-9ed34b31d532/sanity-hsm.test_9A.*.1570578423.log
CMD: trevis-63vm10,trevis-63vm11,trevis-63vm12,trevis-63vm9.trevis.whamcloud.com /usr/sbin/lctl dk > /autotest/autotest2/2019-10-08/lustre-b2_12-el7_6-x86_64-vs-lustre-b2_12-ubuntu1804-x86_64--full--1_9__52___8b37fabf-7e63-43bb-bb1c-9ed34b31d532/sanity-hsm.test_9A.debug_log.\$(hostname -s).1570578423.log;
dmesg > /autotest/autotest2/2019-10-08/lustre-b2_12-el7_6-x86_64-vs-lustre-b2_12-ubuntu1804-x86_64--full--1_9__52___8b37fabf-7e63-43bb-bb1c-9ed34b31d532/sanity-hsm.test_9A.dmesg.\$(hostname -s).1570578423.log
Resetting fail_loc on all nodes...CMD: trevis-63vm10,trevis-63vm11,trevis-63vm12,trevis-63vm9.trevis.whamcloud.com lctl set_param -n fail_loc=0 fail_val=0 2>/dev/null
done.
CMD: trevis-63vm10 libtool execute pkill -x lhsmtool_posix
trevis-63vm10: sh: libtool: command not found
CMD: trevis-63vm10 rm -rf /tmp/arc1/sanity-hsm.test_9A/
FAIL 9A (7s)
In
As Hongchao points out, we do see the 'libtool command missing' message in the logs for the failed test sessions. |
| Comments |
| Comment by Andreas Dilger [ 16/Oct/19 ] |
|
Is this a matter of adding libtool as a Requires (or Debian equivalent) to the package, and installing this on the test nodes? To be honest, I'm not thrilled about the requirement for this, maybe it is only for the test packages? I couldn't see an obvious patch that landed on April 1st that might have caused this, but it should be relatively straight forward to bisect the landings on that day to find the cause of it is failing 100%. |
| Comment by Minh Diep [ 16/Oct/19 ] |
|
we need libtool-bin |
| Comment by Gerrit Updater [ 17/Oct/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36471 |
| Comment by Gerrit Updater [ 12/Nov/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36471/ |
| Comment by Peter Jones [ 12/Nov/19 ] |
|
Landed for 2.14 |
| Comment by James Nunez (Inactive) [ 16/Jan/20 ] |
|
We're seeing sanity-hsm test 9A fail with the same errors for 2.12.4; https://testing.whamcloud.com/test_sets/4dc3aaa6-3414-11ea-b1e8-52540065bddc |
| Comment by Gerrit Updater [ 03/Jun/20 ] |
|
James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38822 |
| Comment by Peter Jones [ 24/Sep/20 ] |
|
Seems like this fix was landed |
| Comment by John Hammond [ 14/Oct/20 ] |
|
> Is this a matter of adding libtool as a Requires (or Debian equivalent) to the package, and installing this on the test nodes? To be honest, I'm not thrilled about the requirement for this, maybe it is only for the test packages? adilger this is not really needed. We just need to remove the libtool uses altogether. See |
| Comment by Gerrit Updater [ 22/Oct/20 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38822/ |