Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.9.0
-
Server 2.5.x
Client 2.5.x
4 Node cluster - 1 MDS, 1 OSS, 2 clients
-
3
-
9223372036854775807
Description
stdout.log ost-pools test_1n: @@@@@@ FAIL: LBUG/LASSERT detected Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:4672:error() = /usr/lib64/lustre/tests/test-framework.sh:4936:run_one() = /usr/lib64/lustre/tests/test-framework.sh:4968:run_one_logged() = /usr/lib64/lustre/tests/test-framework.sh:4774:run_test() = /usr/lib64/lustre/tests/ost-pools.sh:336:main() Dumping lctl log to /tmp/test_logs/1458556257/ost-pools.test_1n.*.1458556373.log fre1204: Warning: Permanently added 'fre1203,192.168.112.3' (RSA) to the list of known hosts. fre1201: Warning: Permanently added 'fre1203,192.168.112.3' (RSA) to the list of known hosts. fre1202: Warning: Permanently added 'fre1203,192.168.112.3' (RSA) to the list of known hosts. Resetting fail_loc and fail_val on all nodes...done. FAIL 1n (117s)
check_catastrophe() defect :
check_catastrophe() { local nodes=${1:-$(comma_list $(nodes_list))} do_nodes $nodes "rc=0; val=\\\$($LCTL get_param -n catastrophe 2>&1); if [[ \\\$? -eq 0 && \\\$val -ne 0 ]]; then echo \\\$(hostname -s): \\\$val; rc=\\\$val; fi; exit \\\$rc" }
If some node is not not accessible check_catastrophe() returns 255:
fre1202: ssh: connect to host fre1202 port 22: Connection timed out pdsh@fre1203: fre1202: ssh exited with exit code 255
and run_one() exits with error while LBUG/LASSERT does not happen
run_one() check_catastrophe || error "LBUG/LASSERT detected"
Attachments
Issue Links
- is related to
-
LU-8805 Failover: recovery-mds-scale test_failover_mds: test_failover_mds returned 4
- Resolved