Details
-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
Lustre 2.12.0
-
3
-
9223372036854775807
Description
recovery-small test_134 fails with 'mv failed' for failover test sessions with three slightly different error messages.
Looking at the test results at https://testing.whamcloud.com/test_sets/8e8c4c2c-ff0f-11e8-b970-52540065bddc, in the client test_log, we see an error when we try to move a file
Started lustre-MDT0000 trevis-18vm7: mv: cannot move ‘/mnt/lustre/d134.recovery-small/2/f134.recovery-small’ to ‘/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2’: Input/output error
Looking at the Client 2 (vm7) dmesg log, we see the move command and client evict message
[ 6809.839939] Lustre: DEBUG MARKER: trevis-18vm7.trevis.whamcloud.com: executing set_default_debug -1 all 4 [ 6810.338678] Lustre: DEBUG MARKER: mv /mnt/lustre/d134.recovery-small/2/f134.recovery-small /mnt/lustre/d134.recovery-small/2/f134.recovery-small_2 [ 6819.972537] Lustre: 4372:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1544690327/real 1544690327] req@ffff88facf421800 x1619718087405600/t0(0) o400->MGC10.9.4.221@tcp@10.9.4.222@tcp:26/25 lens 224/224 e 0 to 1 dl 1544690334 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1 [ 6819.975188] Lustre: 4372:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages [ 6819.976154] LustreError: 166-1: MGC10.9.4.221@tcp: Connection to MGS (at 10.9.4.222@tcp) was lost; in progress operations using this service will fail [ 6819.977455] LustreError: Skipped 1 previous similar message [ 6825.071794] Lustre: lustre-MDT0000-mdc-ffff88faf903b800: Connection to lustre-MDT0000 (at 10.9.4.222@tcp) was lost; in progress operations using this service will wait for recovery to complete [ 6825.074196] Lustre: Skipped 2 previous similar messages [ 6871.792922] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-18vm11.trevis.whamcloud.com: executing set_default_debug -1 all 4 [ 6871.985510] Lustre: DEBUG MARKER: trevis-18vm11.trevis.whamcloud.com: executing set_default_debug -1 all 4 [ 6895.189707] Lustre: Evicted from MGS (at 10.9.4.221@tcp) after server handle changed from 0x5bd95132c9c3f175 to 0x20082f9d6cc0548d [ 6895.190991] Lustre: Skipped 1 previous similar message [ 6895.192169] Lustre: MGC10.9.4.221@tcp: Connection restored to 10.9.4.221@tcp (at 10.9.4.221@tcp) [ 6895.193082] Lustre: Skipped 10 previous similar messages [ 6900.197799] LustreError: 167-0: lustre-MDT0000-mdc-ffff88faf903b800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
LU-11560 has some similarities to this failure, but has the added failure of the remove fails before the file move fails.
Additional logs for these failures are at
https://testing.whamcloud.com/test_sets/9aa46956-fe8b-11e8-a97c-52540065bddc
https://testing.whamcloud.com/test_sets/4e5dea2a-fd1b-11e8-8512-52540065bddc
https://testing.whamcloud.com/test_sets/3a27684a-fbb0-11e8-8a18-52540065bddc
https://testing.whamcloud.com/test_sets/60009774-f617-11e8-b67f-52540065bddc
There are more failures for this test that has a little different error message
trevis-11vm7: error: invalid path '/mnt/lustre': Input/output error trevis-11vm7: mv: cannot stat '/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2': Input/output error
Logs for these failures are at
https://testing.whamcloud.com/test_sets/b2069cbc-eeab-11e8-86c0-52540065bddc
https://testing.whamcloud.com/test_sets/523a7a66-ef54-11e8-b67f-52540065bddc
There are more failures for this test that has a little different error message and that look a little more like LU-11560
Started lustre-MDT0000 trevis-7vm3: mv: cannot stat '/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2': Input/output error trevis-7vm3: error: invalid path '/mnt/lustre': Input/output error
Logs for these failures are at
https://testing.whamcloud.com/test_sets/666b91b0-e959-11e8-815b-52540065bddc