[LU-10522] recovery-random-scale test_fail_client_mds: test_fail_client_mds returned 4 Created: 16/Jan/18 Updated: 03/Dec/21 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.10.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Failover |
||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
recovery-random-scale test_fail_client_mds - test_fail_client_mds returned 4 This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: test_fail_client_mds failed with the following error: test_fail_client_mds returned 4 Test logs: ==== Checking the clients loads BEFORE failover -- failure NOT OK ELAPSED=3962 DURATION=86400 PERIOD=1200
10:34:00 (1515580440) waiting for onyx-41vm3 network 5 secs ...
10:34:00 (1515580440) network interface is UP
CMD: onyx-41vm3 rc=0;
val=\$(/usr/sbin/lctl get_param -n catastrophe 2>&1);
if [[ \$? -eq 0 && \$val -ne 0 ]]; then
echo \$(hostname -s): \$val;
rc=\$val;
fi;
exit \$rc
CMD: onyx-41vm3 ps auxwww | grep -v grep | grep -q run_dd.sh
Client load failed on node onyx-41vm3, rc=1
2018-01-10 10:34:31 Terminating clients loads ...
Duration: 86400
Server failover period: 1200 seconds
Exited after: 3962 seconds
Number of failovers before exit:
mds1 failed over 4 times
Status: FAIL: rc=4
CMD: onyx-41vm3,onyx-41vm4 test -f /tmp/client-load.pid &&
{ kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; }
onyx-41vm3: sh: line 1: kill: (8054) - No such process
run_tar_debug.onyx-41vm4.log tar: etc/ssl: Cannot stat: No such file or directory tar: etc/systemd/system/getty.target.wants: Cannot stat: No such file or directory tar: etc/systemd/system/sockets.target.wants: Cannot stat: No such file or directory tar: etc/systemd/system/multi-user.target.wants: Cannot stat: No such file or directory tar: etc/systemd/system/sysinit.target.wants: Cannot stat: No such file or directory tar: etc/systemd/system/dev-virtio\\x2dports-org.qemu.guest_agent.0.device.wants: Cannot stat: No such file or directory tar: etc/systemd/system/remote-fs.target.wants: Cannot stat: No such file or directory tar: etc/systemd/system/basic.target.wants: Cannot stat: No such file or directory tar: etc/systemd/system/default.target.wants: Cannot stat: No such file or directory tar: etc/systemd/system: Cannot stat: No such file or directory tar: etc/systemd: Cannot stat: No such file or directory tar: etc/rc.d/rc1.d: Cannot stat: No such file or directory tar: etc/rc.d/rc3.d: Cannot stat: No such file or directory tar: etc/rc.d/rc2.d: Cannot stat: No such file or directory tar: etc/rc.d/rc4.d: Cannot stat: No such file or directory tar: etc/rc.d/rc0.d: Cannot stat: No such file or directory tar: etc/rc.d/rc5.d: Cannot stat: No such file or directory tar: etc/rc.d/rc6.d: Cannot stat: No such file or directory tar: etc/rc.d: Cannot stat: No such file or directory tar: etc/alternatives: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors |
| Comments |
| Comment by Alena Nikitenko [ 03/Dec/21 ] |
|
Found a similar issue with recovery-random-scale test set on 2.12.8: https://testing.whamcloud.com/test_sets/e735d4c7-0211-48dc-82e6-a6ba45ceb281 But return code is different due to it being a different test:
...
Starting client: onyx-112vm10: -o user_xattr,flock onyx-70vm3:onyx-70vm4:/lustre /mnt/lustre
CMD: onyx-112vm10 mkdir -p /mnt/lustre
CMD: onyx-112vm10 mount -t lustre -o user_xattr,flock onyx-70vm3:onyx-70vm4:/lustre /mnt/lustre
onyx-112vm10: mount.lustre: according to /etc/mtab onyx-70vm3:onyx-70vm4:/lustre is already mounted on /mnt/lustre
2021-11-20 21:05:50 Terminating clients loads ...
Duration: 86400
Server failover period: 1200 seconds
Exited after: 65095 seconds
Number of failovers before exit:
mds1 failed over 55 times
Status: FAIL: rc=1
CMD: onyx-112vm10,onyx-112vm9 test -f /tmp/client-load.pid &&
{ kill -s TERM \$(cat /tmp/client-load.pid); rm -f /tmp/client-load.pid; }
...
tar: etc/pki/tls/certs: Cannot stat: No such file or directory tar: etc/pki/tls: Cannot stat: No such file or directory tar: etc/pki/java: Cannot stat: No such file or directory tar: etc/pki/ca-trust/source: Cannot stat: No such file or directory tar: etc/pki/ca-trust: Cannot stat: No such file or directory tar: etc/pki: Cannot stat: No such file or directory tar: etc/ssl: Cannot stat: No such file or directory tar: etc/pam.d: Cannot stat: No such file or directory tar: etc/rc.d/rc0.d: Cannot stat: No such file or directory tar: etc/rc.d/rc6.d: Cannot stat: No such file or directory tar: etc/rc.d/rc1.d: Cannot stat: No such file or directory tar: etc/rc.d/rc4.d: Cannot stat: No such file or directory tar: etc/rc.d/rc5.d: Cannot stat: No such file or directory tar: etc/rc.d/rc3.d: Cannot stat: No such file or directory tar: etc/rc.d/rc2.d: Cannot stat: No such file or directory tar: etc/rc.d: Cannot stat: No such file or directory tar: etc/sysconfig/network-scripts: Cannot stat: No such file or directory tar: etc/sysconfig: Cannot stat: No such file or directory tar: etc/profile.d: Cannot stat: No such file or directory tar: etc/sysctl.d: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors |