[LU-3913] Failure on test suite recovery-mds-scale test_failover_mds Created: 09/Sep/13  Updated: 14/Oct/15  Resolved: 09/Sep/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.5.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Maloo Assignee: WC Triage
Resolution: Duplicate Votes: 0
Labels: None
Environment:

server and client: lustre-master build # 1652


Issue Links:
Duplicate
duplicates LU-3906 Failure on test suite parallel-scale ... Resolved
Severity: 3
Rank (Obsolete): 10313

 Description   

This issue was created by maloo for sarah <sarah@whamcloud.com>

This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/41232226-178e-11e3-a71f-52540035b04c.

The sub-test test_failover_mds failed with the following error:

test_failover_mds returned 1

After failover mds for 5 times, check clients load failed as follows:
on client-2

07:46:54:Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failover -- failure NOT OK
07:46:54:Lustre: DEBUG MARKER: rc=$([ -f /proc/sys/lnet/catastrophe ] &&
07:46:54:		echo $(< /proc/sys/lnet/catastrophe) || echo 0);
07:46:54:		if [ $rc -ne 0 ]; then echo $(hostname): $rc; fi
07:46:54:		exit $rc
07:46:54:Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_dd.sh
07:46:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 has failed over 5 times, and counting...
07:46:56:Lustre: DEBUG MARKER: mds1 has failed over 5 times, and counting...
07:47:18:Lustre: Evicted from MGS (at 10.10.4.154@tcp) after server handle changed from 0x77e75e068ccb4c86 to 0x9bc9da83a851b6f9
07:47:18:Lustre: MGC10.10.4.150@tcp: Connection restored to MGS (at 10.10.4.154@tcp)
07:47:18:Lustre: lustre-MDT0000-mdc-ffff88007cfb2800: Connection restored to lustre-MDT0000 (at 10.10.4.154@tcp)
07:58:34:LustreError: 7287:0:(vvp_io.c:1078:vvp_io_commit_write()) Write page 74807 of inode ffff8800525f8638 failed -28
08:00:38:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Duration:               86400
08:00:38:Server failover period: 900 seconds
08:00:39:Exited after:           3682 seconds
08:00:39:Number of failovers before exit:
08:00:39:mds1: 5 times
08:00:39:ost1: 0 times

Generated at Sat Feb 10 01:38:00 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.