Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-11789

recovery-small test 134 fails with 'mv failed'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.12.0
    • 3
    • 9223372036854775807

    Description

      recovery-small test_134 fails with 'mv failed' for failover test sessions with three slightly different error messages.

      Looking at the test results at https://testing.whamcloud.com/test_sets/8e8c4c2c-ff0f-11e8-b970-52540065bddc, in the client test_log, we see an error when we try to move a file

      Started lustre-MDT0000
      trevis-18vm7: mv: cannot move ‘/mnt/lustre/d134.recovery-small/2/f134.recovery-small’ to ‘/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2’: Input/output error
      

      Looking at the Client 2 (vm7) dmesg log, we see the move command and client evict message

       [ 6809.839939] Lustre: DEBUG MARKER: trevis-18vm7.trevis.whamcloud.com: executing set_default_debug -1 all 4
      [ 6810.338678] Lustre: DEBUG MARKER: mv /mnt/lustre/d134.recovery-small/2/f134.recovery-small /mnt/lustre/d134.recovery-small/2/f134.recovery-small_2
      [ 6819.972537] Lustre: 4372:0:(client.c:2132:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1544690327/real 1544690327]  req@ffff88facf421800 x1619718087405600/t0(0) o400->MGC10.9.4.221@tcp@10.9.4.222@tcp:26/25 lens 224/224 e 0 to 1 dl 1544690334 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
      [ 6819.975188] Lustre: 4372:0:(client.c:2132:ptlrpc_expire_one_request()) Skipped 5 previous similar messages
      [ 6819.976154] LustreError: 166-1: MGC10.9.4.221@tcp: Connection to MGS (at 10.9.4.222@tcp) was lost; in progress operations using this service will fail
      [ 6819.977455] LustreError: Skipped 1 previous similar message
      [ 6825.071794] Lustre: lustre-MDT0000-mdc-ffff88faf903b800: Connection to lustre-MDT0000 (at 10.9.4.222@tcp) was lost; in progress operations using this service will wait for recovery to complete
      [ 6825.074196] Lustre: Skipped 2 previous similar messages
      [ 6871.792922] Lustre: DEBUG MARKER: /usr/sbin/lctl mark trevis-18vm11.trevis.whamcloud.com: executing set_default_debug -1 all 4
      [ 6871.985510] Lustre: DEBUG MARKER: trevis-18vm11.trevis.whamcloud.com: executing set_default_debug -1 all 4
      [ 6895.189707] Lustre: Evicted from MGS (at 10.9.4.221@tcp) after server handle changed from 0x5bd95132c9c3f175 to 0x20082f9d6cc0548d
      [ 6895.190991] Lustre: Skipped 1 previous similar message
      [ 6895.192169] Lustre: MGC10.9.4.221@tcp: Connection restored to 10.9.4.221@tcp (at 10.9.4.221@tcp)
      [ 6895.193082] Lustre: Skipped 10 previous similar messages
      [ 6900.197799] LustreError: 167-0: lustre-MDT0000-mdc-ffff88faf903b800: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      

      LU-11560 has some similarities to this failure, but has the added failure of the remove fails before the file move fails.

      Additional logs for these failures are at
      https://testing.whamcloud.com/test_sets/9aa46956-fe8b-11e8-a97c-52540065bddc
      https://testing.whamcloud.com/test_sets/4e5dea2a-fd1b-11e8-8512-52540065bddc
      https://testing.whamcloud.com/test_sets/3a27684a-fbb0-11e8-8a18-52540065bddc
      https://testing.whamcloud.com/test_sets/60009774-f617-11e8-b67f-52540065bddc

      There are more failures for this test that has a little different error message

      trevis-11vm7: error: invalid path '/mnt/lustre': Input/output error
      trevis-11vm7: mv: cannot stat '/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2': Input/output error
      

      Logs for these failures are at
      https://testing.whamcloud.com/test_sets/b2069cbc-eeab-11e8-86c0-52540065bddc
      https://testing.whamcloud.com/test_sets/523a7a66-ef54-11e8-b67f-52540065bddc

      There are more failures for this test that has a little different error message and that look a little more like LU-11560

      Started lustre-MDT0000
      trevis-7vm3: mv: cannot stat '/mnt/lustre/d134.recovery-small/2/f134.recovery-small_2': Input/output error
      trevis-7vm3: error: invalid path '/mnt/lustre': Input/output error
      

      Logs for these failures are at
      https://testing.whamcloud.com/test_sets/666b91b0-e959-11e8-815b-52540065bddc

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: