[LU-11789] recovery-small test 134 fails with 'mv failed' Created: 17/Dec/18 Updated: 17/Dec/18 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | failover | ||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
recovery-small test_134 fails with 'mv failed' for failover test sessions with three slightly different error messages. Looking at the test results at https://testing.whamcloud.com/test_sets/8e8c4c2c-ff0f-11e8-b970-52540065bddc, in the client test_log, we see an error when we try to move a file Started lustre-MDT0000 trevis-18vm7: mv: cannot move ‘/mnt/lustre/d134.recovery-small/2/f134.recovery-small’ to ‘/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2’: Input/output error Looking at the Client 2 (vm7) dmesg log, we see the move command and client evict message [ 6809.839939] Lustre: DEBUG MARKER: trevis-18vm7.trevis.whamcloud.com: executing set_default_debug -1 all 4 [ 6810.338678] Lustre: DEBUG MARKER: mv /mnt/lustre/d134.recovery-small/2/f134.recovery-small /mnt/lustre/d134.recovery-small/2/f134.recovery-small_2 [ 6819.972537] Lustre: 4372:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1544690327/real 1544690327] req@ffff88facf421800 x1619718087405600/t0(0) o400->MGC10.9.4.221@tcp@10.9.4.222@tcp:26/25 lens 224/224 e 0 to 1 dl 1544690334 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 6819.975188] Lustre: 4372:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [ 6819.976154] LustreError: 166-1: MGC10.9.4.221@tcp: Connection to MGS (at 10.9.4.222@tcp) was lost; in progress operations using this service will fail [ 6819.977455] LustreError: Skipped 1 previous similar message [ 6825.071794] Lustre: lustre-MDT0000-mdc-ffff88faf903b800: Connection to lustre-MDT0000 (at 10.9.4.222@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 6825.074196] Lustre: Skipped 2 previous similar messages [ 6871.792922] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-18vm11.trevis.whamcloud.com: executing set_default_debug -1 all 4 [ 6871.985510] Lustre: DEBUG MARKER: trevis-18vm11.trevis.whamcloud.com: executing set_default_debug -1 all 4 [ 6895.189707] Lustre: Evicted from MGS (at 10.9.4.221@tcp) after server handle changed from 0x5bd95132c9c3f175 to 0x20082f9d6cc0548d [ 6895.190991] Lustre: Skipped 1 previous similar message [ 6895.192169] Lustre: MGC10.9.4.221@tcp: Connection restored to 10.9.4.221@tcp (at 10.9.4.221@tcp) [ 6895.193082] Lustre: Skipped 10 previous similar messages [ 6900.197799] LustreError: 167-0: lustre-MDT0000-mdc-ffff88faf903b800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail. LU-11560 has some similarities to this failure, but has the added failure of the remove fails before the file move fails. Additional logs for these failures are at There are more failures for this test that has a little different error message trevis-11vm7: error: invalid path '/mnt/lustre': Input/output error trevis-11vm7: mv: cannot stat '/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2': Input/output error Logs for these failures are at There are more failures for this test that has a little different error message and that look a little more like LU-11560 Started lustre-MDT0000 trevis-7vm3: mv: cannot stat '/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2': Input/output error trevis-7vm3: error: invalid path '/mnt/lustre': Input/output error Logs for these failures are at |