Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5900

replay-dual test_11: rm: cannot remove `/mnt/lustre/f11.replay-dual-[1-5]': No such file or directory

Details

    • 3
    • 16486

    Description

      replay-dual test 11 failed as follows:

      rm: cannot remove `/mnt/lustre/f11.replay-dual-[1-5]': No such file or directory
       replay-dual test_11: @@@@@@ FAIL: test_11 failed with 1
      

      Dmesg on client node:

      Lustre: 15249:0:(client.c:2752:ptlrpc_replay_interpret()) @@@ Version mismatch during replay
        req@ffff88006a414400 x1484130260889212/t515396075526(515396075526) o36->lustre-MDT0000-mdc-ffff88006aa09400@10.1.4.66@tcp:12/10 lens 520/416 e 1 to 0 dl 1415377944 ref 2 fl Interpret:R/4/0 rc -75/-75
      LustreError: 15249:0:(client.c:2740:ptlrpc_replay_interpret()) request replay timed out, restarting recovery
      LustreError: 167-0: lustre-MDT0000-mdc-ffff880037fb4400: This client was evicted by lustre-MDT0000; in progress operations using this service will fail.
      LustreError: 1678:0:(mdc_locks.c:918:mdc_enqueue()) ldlm_cli_enqueue: -5
      Lustre: lustre-MDT0000-mdc-ffff880037fb4400: Connection restored to lustre-MDT0000 (at 10.1.4.66@tcp)
      LustreError: 1678:0:(dir.c:378:ll_get_dir_page()) lock enqueue: [0x200000007:0x1:0x0] at 0: rc -5
      LustreError: 1678:0:(dir.c:584:ll_dir_read()) error reading dir [0x200000007:0x1:0x0] at 0: rc -5
      

      Dmesg on MDS node:

      Lustre: lustre-MDT0000: recovery is timed out, evict stale exports
      Lustre: lustre-MDT0000: disconnecting 1 stale clients
      Lustre: 18536:0:(ldlm_lib.c:2092:target_recovery_thread()) too long recovery - read logs
      LustreError: dumping log to /tmp/lustre-log.1415377850.18536
      

      Maloo report: https://testing.hpdd.intel.com/test_sets/a6c1b3de-68c5-11e4-a63a-5254006e85c2

      Attachments

        Issue Links

          Activity

            [LU-5900] replay-dual test_11: rm: cannot remove `/mnt/lustre/f11.replay-dual-[1-5]': No such file or directory
            pjones Peter Jones added a comment -

            As per Yu Jian this can be closed as a duplicate of LU-5079

            pjones Peter Jones added a comment - As per Yu Jian this can be closed as a duplicate of LU-5079
            yujian Jian Yu added a comment -

            It was the patches http://review.whamcloud.com/11213 (master) and http://review.whamcloud.com/12365 (b2_5) for LU-5079 that caused the regressions.

            yujian Jian Yu added a comment - It was the patches http://review.whamcloud.com/11213 (master) and http://review.whamcloud.com/12365 (b2_5) for LU-5079 that caused the regressions.
            yujian Jian Yu added a comment - The same regression failure also occurred on master branch: https://testing.hpdd.intel.com/test_sets/174878bc-5aad-11e4-8200-5254006e85c2 https://testing.hpdd.intel.com/test_sets/b126fa0a-6a32-11e4-b203-5254006e85c2
            yujian Jian Yu added a comment - More instance on Lustre b2_5 build #100: https://testing.hpdd.intel.com/test_sets/23fe8258-69d1-11e4-8f09-5254006e85c2
            yujian Jian Yu added a comment -

            Here is a for-test-only patch trying to reproduce the failure on Lustre b2_5 build #100: http://review.whamcloud.com/11611

            yujian Jian Yu added a comment - Here is a for-test-only patch trying to reproduce the failure on Lustre b2_5 build #100: http://review.whamcloud.com/11611
            yujian Jian Yu added a comment -

            The same failure also occurred on SLES11SP3/x86_64 client + RHEL6.5/x86_64 server test session:
            https://testing.hpdd.intel.com/test_sets/bdf623d0-6872-11e4-acbe-5254006e85c2

            It's a regression failure introduced by Lustre b2_5 build #100.

            Unfortunately, I found that replay-dual was not in autotest review test groups, so the failure was not detected in patch review testing.

            yujian Jian Yu added a comment - The same failure also occurred on SLES11SP3/x86_64 client + RHEL6.5/x86_64 server test session: https://testing.hpdd.intel.com/test_sets/bdf623d0-6872-11e4-acbe-5254006e85c2 It's a regression failure introduced by Lustre b2_5 build #100. Unfortunately, I found that replay-dual was not in autotest review test groups, so the failure was not detected in patch review testing.

            People

              wc-triage WC Triage
              yujian Jian Yu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: