Details
-
Bug
-
Resolution: Duplicate
-
Minor
-
None
-
Lustre 2.5.0
-
None
-
server and client: lustre-master build # 1652
-
3
-
10313
Description
This issue was created by maloo for sarah <sarah@whamcloud.com>
This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/41232226-178e-11e3-a71f-52540035b04c.
The sub-test test_failover_mds failed with the following error:
test_failover_mds returned 1
After failover mds for 5 times, check clients load failed as follows:
on client-2
07:46:54:Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failover -- failure NOT OK 07:46:54:Lustre: DEBUG MARKER: rc=$([ -f /proc/sys/lnet/catastrophe ] && 07:46:54: echo $(< /proc/sys/lnet/catastrophe) || echo 0); 07:46:54: if [ $rc -ne 0 ]; then echo $(hostname): $rc; fi 07:46:54: exit $rc 07:46:54:Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_dd.sh 07:46:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 has failed over 5 times, and counting... 07:46:56:Lustre: DEBUG MARKER: mds1 has failed over 5 times, and counting... 07:47:18:Lustre: Evicted from MGS (at 10.10.4.154@tcp) after server handle changed from 0x77e75e068ccb4c86 to 0x9bc9da83a851b6f9 07:47:18:Lustre: MGC10.10.4.150@tcp: Connection restored to MGS (at 10.10.4.154@tcp) 07:47:18:Lustre: lustre-MDT0000-mdc-ffff88007cfb2800: Connection restored to lustre-MDT0000 (at 10.10.4.154@tcp) 07:58:34:LustreError: 7287:0:(vvp_io.c:1078:vvp_io_commit_write()) Write page 74807 of inode ffff8800525f8638 failed -28 08:00:38:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Duration: 86400 08:00:38:Server failover period: 900 seconds 08:00:39:Exited after: 3682 seconds 08:00:39:Number of failovers before exit: 08:00:39:mds1: 5 times 08:00:39:ost1: 0 times