Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-482

Test failure on test suite replay-dual, subtest test_0a

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Minor
    • None
    • Lustre 2.3.0, Lustre 2.1.3, Lustre 1.8.8
    • None
    • 3
    • 4591

    Description

      This issue was created by maloo for yujian <yujian@whamcloud.com>

      This issue relates to the following test suite run: https://maloo.whamcloud.com/test_sets/38eaf1c4-a6f2-11e0-bd2a-52540025f9af.

      The sub-test test_0a failed with the following error:

      Restart of mds1 failed!


      Info required for matching: replay-dual 0a

      Attachments

        Issue Links

          Activity

            [LU-482] Test failure on test suite replay-dual, subtest test_0a

            Pushed a patch under LU-7372 to disable test_26 so that this can be moved back into review testing.

            adilger Andreas Dilger added a comment - Pushed a patch under LU-7372 to disable test_26 so that this can be moved back into review testing.

            Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/14414
            Subject: LU-482 tests: enable replay-dual in default test runs
            Project: private/autotest
            Branch: master
            Current Patch Set: 1
            Commit: 55dc7a2e14f9722fee46e4a635f892b498ea51f2

            gerrit Gerrit Updater added a comment - Andreas Dilger (andreas.dilger@intel.com) uploaded a new patch: http://review.whamcloud.com/14414 Subject: LU-482 tests: enable replay-dual in default test runs Project: private/autotest Branch: master Current Patch Set: 1 Commit: 55dc7a2e14f9722fee46e4a635f892b498ea51f2

            This does not appear to be failing anymore.

            adilger Andreas Dilger added a comment - This does not appear to be failing anymore.

            This appears to be the reason that replay-dual was first disabled. The recent test runs don't show any failures in this test.

            adilger Andreas Dilger added a comment - This appears to be the reason that replay-dual was first disabled. The recent test runs don't show any failures in this test.

            To add to kelsey's question, the environment we are seeing this in is also VM clusters however the block devices in the VMs that are used for Lustre target are not on LVs, but are directly on the virtual disks provided by KVM.

            brian Brian Murrell (Inactive) added a comment - To add to kelsey 's question, the environment we are seeing this in is also VM clusters however the block devices in the VMs that are used for Lustre target are not on LVs, but are directly on the virtual disks provided by KVM.

            Was the source of the corruption of the group descriptors ever nailed down? I am seeing something similar in some failover testing on 2.3.

            kelsey Kelsey Prantis (Inactive) added a comment - Was the source of the corruption of the group descriptors ever nailed down? I am seeing something similar in some failover testing on 2.3.

            I took a good look at maloo today a few things stand out.

            Master review has not see this issue since last summer but review seem to no run replay_dual since about the same time.

            The error:

            test_0a failed with 1
            

            Looks like it might be the same same error as it reports

            stat: cannot read file system information for `/mnt/lustre': Resource temporarily unavailable
             replay-dual test_0a: @@@@@@ FAIL: test_0a failed with 1 
            

            And we know what part of the problem was there was some

            I can't seem to find a spot where this has failed on master in a while. It for sure plagues 2.1,2.2 and 2.3 but I don't see many signs of issues on present day Master.

            The more recent:

            gethostbyname("hostname") failed
            

            Errors were fixed by: LU-2008

            There is no sign of the lu-482 skip that the test makes possible.

            The test reports 7 out of 100. Those 7 are "full" tests running interopt with 2.1

            keith Keith Mannthey (Inactive) added a comment - I took a good look at maloo today a few things stand out. Master review has not see this issue since last summer but review seem to no run replay_dual since about the same time. The error: test_0a failed with 1 Looks like it might be the same same error as it reports stat: cannot read file system information for `/mnt/lustre': Resource temporarily unavailable replay-dual test_0a: @@@@@@ FAIL: test_0a failed with 1 And we know what part of the problem was there was some I can't seem to find a spot where this has failed on master in a while. It for sure plagues 2.1,2.2 and 2.3 but I don't see many signs of issues on present day Master. The more recent: gethostbyname("hostname") failed Errors were fixed by: LU-2008 There is no sign of the lu-482 skip that the test makes possible. The test reports 7 out of 100. Those 7 are "full" tests running interopt with 2.1
            yujian Jian Yu added a comment -

            Lustre Client: v2_1_4_RC1
            Lustre Server: 2.1.3
            Distro/Arch: RHEL6.3/x86_64
            Network: IB (in-kernel OFED)
            https://maloo.whamcloud.com/test_sets/635a52e8-487b-11e2-8cdc-52540035b04c

            yujian Jian Yu added a comment - Lustre Client: v2_1_4_RC1 Lustre Server: 2.1.3 Distro/Arch: RHEL6.3/x86_64 Network: IB (in-kernel OFED) https://maloo.whamcloud.com/test_sets/635a52e8-487b-11e2-8cdc-52540035b04c
            yujian Jian Yu added a comment - Lustre Server Build: http://build.whamcloud.com/job/lustre-b2_3/32 Lustre Client Build: http://build.whamcloud.com/job/lustre-b2_1/121 Distribution: CentOS release 6.3 https://maloo.whamcloud.com/test_sets/d27db876-1285-11e2-a663-52540035b04c
            yujian Jian Yu added a comment - More instances on Lustre 2.1.3 RC2: https://maloo.whamcloud.com/test_sets/1275a262-eb3b-11e1-ba73-52540035b04c

            People

              wc-triage WC Triage
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: