[LU-3913] Failure on test suite recovery-mds-scale test_failover_mds Created: 09/Sep/13 Updated: 14/Oct/15 Resolved: 09/Sep/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Maloo | Assignee: | WC Triage |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Environment: |
server and client: lustre-master build # 1652 |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 10313 | ||||||||
| Description |
|
This issue was created by maloo for sarah <sarah@whamcloud.com> This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/41232226-178e-11e3-a71f-52540035b04c. The sub-test test_failover_mds failed with the following error:
After failover mds for 5 times, check clients load failed as follows: 07:46:54:Lustre: DEBUG MARKER: ==== Checking the clients loads AFTER failover -- failure NOT OK 07:46:54:Lustre: DEBUG MARKER: rc=$([ -f /proc/sys/lnet/catastrophe ] && 07:46:54: echo $(< /proc/sys/lnet/catastrophe) || echo 0); 07:46:54: if [ $rc -ne 0 ]; then echo $(hostname): $rc; fi 07:46:54: exit $rc 07:46:54:Lustre: DEBUG MARKER: ps auxwww | grep -v grep | grep -q run_dd.sh 07:46:55:Lustre: DEBUG MARKER: /usr/sbin/lctl mark mds1 has failed over 5 times, and counting... 07:46:56:Lustre: DEBUG MARKER: mds1 has failed over 5 times, and counting... 07:47:18:Lustre: Evicted from MGS (at 10.10.4.154@tcp) after server handle changed from 0x77e75e068ccb4c86 to 0x9bc9da83a851b6f9 07:47:18:Lustre: MGC10.10.4.150@tcp: Connection restored to MGS (at 10.10.4.154@tcp) 07:47:18:Lustre: lustre-MDT0000-mdc-ffff88007cfb2800: Connection restored to lustre-MDT0000 (at 10.10.4.154@tcp) 07:58:34:LustreError: 7287:0:(vvp_io.c:1078:vvp_io_commit_write()) Write page 74807 of inode ffff8800525f8638 failed -28 08:00:38:Lustre: DEBUG MARKER: /usr/sbin/lctl mark Duration: 86400 08:00:38:Server failover period: 900 seconds 08:00:39:Exited after: 3682 seconds 08:00:39:Number of failovers before exit: 08:00:39:mds1: 5 times 08:00:39:ost1: 0 times |