[LU-8544] recovery-double-scale test_pairwise_fail: start client on trevis-54vm5 failed Created: 25/Aug/16 Updated: 05/Aug/20 Resolved: 29/Sep/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||||||
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/72a0fd32-6033-11e6-aa74-5254006e85c2. The sub-test test_pairwise_fail failed with the following error: start client on trevis-54vm5 failed test logs: CMD: trevis-54vm5 test -f /tmp/client-load.pid &&
{ kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; }
+ pm -h powerman --off trevis-54vm5
Command completed successfully
+ pm -h powerman --on trevis-54vm5
Command completed successfully
14:43:16 (1470926596) waiting for trevis-54vm5 network 900 secs ...
waiting ping -c 1 -w 3 trevis-54vm5, 895 secs left ...
waiting ping -c 1 -w 3 trevis-54vm5, 890 secs left ...
waiting ping -c 1 -w 3 trevis-54vm5, 885 secs left ...
waiting ping -c 1 -w 3 trevis-54vm5, 880 secs left ...
14:43:48 (1470926628) network interface is UP
CMD: trevis-54vm5 hostname
pdsh@trevis-54vm1: trevis-54vm5: mcmd: connect failed: Connection refused
CMD: trevis-54vm5 hostname
Reintegrating trevis-54vm5
Starting client: trevis-54vm5: -o user_xattr,flock trevis-54vm7:trevis-54vm3:/lustre /mnt/lustre
CMD: trevis-54vm5 mkdir -p /mnt/lustre
CMD: trevis-54vm5 mount -t lustre -o user_xattr,flock trevis-54vm7:trevis-54vm3:/lustre /mnt/lustre
CMD: trevis-54vm5 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all -lnet -lnd -pinger\" 4
trevis-54vm5: stat: cannot read file system information for ‘/mnt/lustre’: Input/output error
recovery-double-scale test_pairwise_fail: @@@@@@ FAIL: start client on trevis-54vm5 failed
Trace dump:
= /usr/lib64/lustre/tests/test-framework.sh:4804:error()
= /usr/lib64/lustre/tests/recovery-double-scale.sh:72:reboot_recover_node()
= /usr/lib64/lustre/tests/recovery-double-scale.sh:160:failover_pair()
= /usr/lib64/lustre/tests/recovery-double-scale.sh:251:test_pairwise_fail()
= /usr/lib64/lustre/tests/test-framework.sh:5068:run_one()
= /usr/lib64/lustre/tests/test-framework.sh:5107:run_one_logged()
= /usr/lib64/lustre/tests/test-framework.sh:4954:run_test()
= /usr/lib64/lustre/tests/recovery-double-scale.sh:303:main()
|
| Comments |
| Comment by Saurabh Tandan (Inactive) [ 06/Sep/16 ] |
|
This issue has been seen around 40 times in past 30 days overall. |
| Comment by Saurabh Tandan (Inactive) [ 07/Sep/16 ] |
|
This issue was first seen for master on 2016-07-08 for build# 3405, Tag 2.8.55, Lustre version: 2.8.55.27.geb2657a |
| Comment by Peter Jones [ 08/Sep/16 ] |
|
Hongchao Could you please advise on this one? Thanks Peter |
| Comment by James Nunez (Inactive) [ 08/Sep/16 ] |
|
Looking at test failures in Maloo, I see that this test started failing with this error message on 2016-07-01. I see failures on both onyx and trevis. |
| Comment by Hongchao Zhang [ 09/Sep/16 ] |
|
the problem could be related to 00000080:00000004:0.0:1472256895.043480:0:3882:0:(obd_class.h:1166:obd_statfs_async()) lustre-clilov-ffff880037e6c000: osfs ffff88007aa25210 age 4294647497, max_age 4294916319 00020000:00080000:0.0:1472256895.043483:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 0 inactive 00020000:00080000:0.0:1472256895.043484:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 1 inactive 00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 2 inactive 00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 3 inactive 00020000:00080000:0.0:1472256895.043485:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 4 inactive 00020000:00080000:0.0:1472256895.043486:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 5 inactive 00020000:00080000:0.0:1472256895.043486:0:3882:0:(lov_request.c:648:lov_prep_statfs_set()) lov idx 6 inactive 00000080:00020000:0.0:1472256895.043488:0:3882:0:(llite_lib.c:1890:ll_statfs_internal()) obd_statfs fails: rc = -5 |
| Comment by Gerrit Updater [ 13/Sep/16 ] |
|
Hongchao Zhang (hongchao.zhang@intel.com) uploaded a new patch: http://review.whamcloud.com/22459 |
| Comment by Gerrit Updater [ 29/Sep/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22459/ |
| Comment by Peter Jones [ 29/Sep/16 ] |
|
Landed for 2.9 |
| Comment by Hongchao Zhang [ 14/Oct/16 ] |
|
Hi Bruno, |