Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4442

Failure on test suite replay-vbr test_7g: Test 7g.3 failed

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.6.0, Lustre 2.5.1
    • Lustre 2.1.6, Lustre 2.6.0, Lustre 2.5.1, Lustre 2.4.3
    • client and server: lustre-master build 1823 RHEL6 ldiskfs
    • 3
    • 12186

    Description

      This issue was created by maloo for sarah <sarah@whamcloud.com>

      This issue relates to the following test suite run: http://maloo.whamcloud.com/test_sets/930d9194-74df-11e3-96b0-52540035b04c.

      The sub-test test_7g failed with the following error:

      Test 7g.3 failed

      test log shows:

      CMD: client-31vm7 /usr/sbin/lctl list_param osp.*osc*.old_sync_processed 2> /dev/null
      osp.lustre-OST0000-osc-MDT0000.old_sync_processed
      osp.lustre-OST0001-osc-MDT0000.old_sync_processed
      osp.lustre-OST0002-osc-MDT0000.old_sync_processed
      osp.lustre-OST0003-osc-MDT0000.old_sync_processed
      osp.lustre-OST0004-osc-MDT0000.old_sync_processed
      osp.lustre-OST0005-osc-MDT0000.old_sync_processed
      osp.lustre-OST0006-osc-MDT0000.old_sync_processed
      wait mds1 secs maximumly for client-31vm7 mds-ost sync done.
      /usr/lib64/lustre/tests/test-framework.sh: line 2135: [: mds1: integer expression expected
      CMD: client-31vm7 /usr/sbin/lctl get_param -n osp.*osc*.old_sync_processed
      1
      1
      1
      1
      1
      1
      1
       recovery node iozone not done in mds1 sec. 
       replay-vbr test_7g: @@@@@@ FAIL: Test 7g.3 failed 
      

      Attachments

        Issue Links

          Activity

            [LU-4442] Failure on test suite replay-vbr test_7g: Test 7g.3 failed
            yujian Jian Yu added a comment - Patches for Lustre b2_5 branch: http://review.whamcloud.com/9289 http://review.whamcloud.com/9290
            emoly.liu Emoly Liu added a comment -

            Both patches have been landed to 2.6.

            emoly.liu Emoly Liu added a comment - Both patches have been landed to 2.6.
            emoly.liu Emoly Liu added a comment -

            Thanks, Tappro, I saw your choice of http://review.whamcloud.com/8973 .

            emoly.liu Emoly Liu added a comment - Thanks, Tappro, I saw your choice of http://review.whamcloud.com/8973 .
            emoly.liu Emoly Liu added a comment -

            The root cause of this failure is that since mdt_object_exists() was added to mdt_reint_link() in http://review.whamcloud.com/#/c/8371, if the child object doesn't exist, there is no chance to do object version check and client1 will not be evicted.

            I create the following two patches to fix this problem, and I am not sure which is better:

            Tappro, could you please give any advice? Thanks.

            emoly.liu Emoly Liu added a comment - The root cause of this failure is that since mdt_object_exists() was added to mdt_reint_link() in http://review.whamcloud.com/#/c/8371 , if the child object doesn't exist, there is no chance to do object version check and client1 will not be evicted. I create the following two patches to fix this problem, and I am not sure which is better: http://review.whamcloud.com/8972 is to do mdt_version_get_check() before calling mdt_object_exists(), so that the client can be evicted due to (EOVERFLOW)"Version mismatch" as before; http://review.whamcloud.com/8973 is to change test_7g script to check the different return value for different MDS version. Tappro, could you please give any advice? Thanks.
            emoly.liu Emoly Liu added a comment -

            By searching maloo, I notice this error has occurred since Dec. 21, and finally I find it is related to LU-3528 http://review.whamcloud.com/8371 .

            I am working on the patch.

            emoly.liu Emoly Liu added a comment - By searching maloo, I notice this error has occurred since Dec. 21, and finally I find it is related to LU-3528 http://review.whamcloud.com/8371 . I am working on the patch.
            emoly.liu Emoly Liu added a comment -

            The maloo test report https://maloo.whamcloud.com/test_logs/2a98cbc0-7bfa-11e3-a7b6-52540035b04c/show_text shows that test_7g has another problem besides test script issue to be fixed by http://review.whamcloud.com/8796.

            I will investigate and provide another patch for it.

            emoly.liu Emoly Liu added a comment - The maloo test report https://maloo.whamcloud.com/test_logs/2a98cbc0-7bfa-11e3-a7b6-52540035b04c/show_text shows that test_7g has another problem besides test script issue to be fixed by http://review.whamcloud.com/8796 . I will investigate and provide another patch for it.
            yujian Jian Yu added a comment -

            Lustre build: http://build.whamcloud.com/job/lustre-reviews/20841/
            Distro/Arch: SLES11SP3/x86_64 (both server and client, kernel version: 3.0.101-0.8)

            The same failure occurred:
            https://maloo.whamcloud.com/test_sets/43ce86d2-798b-11e3-a27b-52540035b04c

            yujian Jian Yu added a comment - Lustre build: http://build.whamcloud.com/job/lustre-reviews/20841/ Distro/Arch: SLES11SP3/x86_64 (both server and client, kernel version: 3.0.101-0.8) The same failure occurred: https://maloo.whamcloud.com/test_sets/43ce86d2-798b-11e3-a27b-52540035b04c
            emoly.liu Emoly Liu added a comment - - edited

            patch at: http://review.whamcloud.com/8796, which fixes the test script issue.

            emoly.liu Emoly Liu added a comment - - edited patch at: http://review.whamcloud.com/8796 , which fixes the test script issue.
            emoly.liu Emoly Liu added a comment -

            I can reproduce it easily. I will investigate and fix it.

            emoly.liu Emoly Liu added a comment - I can reproduce it easily. I will investigate and fix it.
            pjones Peter Jones added a comment -

            Emoly

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Emoly Could you please look into this one? Thanks Peter
            green Oleg Drokin added a comment -

            there's certainly some parsing error somewhee that makes us pick out of mds name instead of some timeout and so thngs go downhill fro there:

            /usr/lib64/lustre/tests/test-framework.sh: line 2135: [: mds1: integer expression expected

            green Oleg Drokin added a comment - there's certainly some parsing error somewhee that makes us pick out of mds name instead of some timeout and so thngs go downhill fro there: /usr/lib64/lustre/tests/test-framework.sh: line 2135: [: mds1: integer expression expected

            People

              emoly.liu Emoly Liu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: