Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.11.0
-
None
-
onyx, failover
servers: el7.4, ldiskfs, branch master, v2.10.55, b3667
clients: sles12sp3, branch master, v2.10.55, b3667
-
3
-
9223372036854775807
Description
session: https://testing.hpdd.intel.com/test_sessions/2c5b36b7-c2a1-4e4a-89e7-993f6d6350b5
test set: https://testing.hpdd.intel.com/test_sets/db9bc182-c4e6-11e7-9c63-52540065bddc
May be related to LU-9601, which loses client 1 (due to OOM).
From test_log:
==== Checking the clients loads AFTER failover -- failure NOT OK 11:36:51 (1510169811) waiting for onyx-40vm1 network 5 secs ... 11:36:51 (1510169811) network interface is UP CMD: onyx-40vm1 rc=0; val=\$(/usr/sbin/lctl get_param -n catastrophe 2>&1); if [[ \$? -eq 0 && \$val -ne 0 ]]; then echo \$(hostname -s): \$val; rc=\$val; fi; exit \$rc pdsh@onyx-40vm5: onyx-40vm1: mcmd: connect failed: Connection refused recovery-mds-scale test_failover_mds: @@@@@@ FAIL: onyx-40vm1:LBUG/LASSERT detected Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5289:error() = /usr/lib64/lustre/tests/test-framework.sh:6285:check_node_health() = /usr/lib64/lustre/tests/test-framework.sh:2277:check_client_load() = /usr/lib64/lustre/tests/test-framework.sh:2322:check_client_loads() = /usr/lib64/lustre/tests/recovery-mds-scale.sh:170:failover_target() = /usr/lib64/lustre/tests/recovery-mds-scale.sh:236:test_failover_mds() = /usr/lib64/lustre/tests/test-framework.sh:5565:run_one() = /usr/lib64/lustre/tests/test-framework.sh:5604:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:5451:run_test() = /usr/lib64/lustre/tests/recovery-mds-scale.sh:238:main()