Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1867

replay-single test_89: @@@@@@ FAIL: 4 blocks leaked

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 4419

    Description

      Hit this problem on Maloo test on latest master branch:
      https://maloo.whamcloud.com/test_sets/07148716-fae1-11e1-a03c-52540035b04c

      It is similiar with ORI-412 reported on Orion.

      Test logs of test_89 attached.

      Attachments

        Issue Links

          Activity

            [LU-1867] replay-single test_89: @@@@@@ FAIL: 4 blocks leaked

            This is still failing for ZFS so the above patch only re-enables test_89 for ldiskfs:
            https://testing.hpdd.intel.com/test_sets/f4f00d1a-4ffe-11e4-8734-5254006e85c2

            adilger Andreas Dilger added a comment - This is still failing for ZFS so the above patch only re-enables test_89 for ldiskfs: https://testing.hpdd.intel.com/test_sets/f4f00d1a-4ffe-11e4-8734-5254006e85c2
            ys Yang Sheng added a comment -

            re-enable test patch: http://review.whamcloud.com/12227

            ys Yang Sheng added a comment - re-enable test patch: http://review.whamcloud.com/12227

            replay-single test_89 is still being skipped on ZFS due to this bug. It looks like the landed patch may resolve the test failure, so a patch to re-enable it should be submitted.

            adilger Andreas Dilger added a comment - replay-single test_89 is still being skipped on ZFS due to this bug. It looks like the landed patch may resolve the test failure, so a patch to re-enable it should be submitted.
            ys Yang Sheng added a comment -

            Patch landed. Close bug.

            ys Yang Sheng added a comment - Patch landed. Close bug.
            ys Yang Sheng added a comment - Patch commit to: http://review.whamcloud.com/#change,4130

            In this case, the test pass condition should be changed to allow 4 blocks (16kB) difference between BLOCKS2 and BLOCKS1 and still pass, along with a comment explaining this. I guess this doesn't explain the "1536 blocks leaked" problem seen in other test failures.

            adilger Andreas Dilger added a comment - In this case, the test pass condition should be changed to allow 4 blocks (16kB) difference between BLOCKS2 and BLOCKS1 and still pass, along with a comment explaining this. I guess this doesn't explain the "1536 blocks leaked" problem seen in other test failures.
            ys Yang Sheng added a comment -

            I have doing some investigate as below. This issue caused by config-llog data not sync between mgs & ost. We count free block first as BLOCK1. Then write data to OST...etc. There have some data wrote in OST of config-data on MGS, but not in OST. Then ost umount, And the config-data will sync when ost remount. There we count free block as BLOCK2. So the BLOCK2 - BLOCK1 is the config-data changes. It may or may not cause a new block be allocated(4k). So we encounter this issue very randomly and the leak block always 4k.

            ys Yang Sheng added a comment - I have doing some investigate as below. This issue caused by config-llog data not sync between mgs & ost. We count free block first as BLOCK1. Then write data to OST...etc. There have some data wrote in OST of config-data on MGS, but not in OST. Then ost umount, And the config-data will sync when ost remount. There we count free block as BLOCK2. So the BLOCK2 - BLOCK1 is the config-data changes. It may or may not cause a new block be allocated(4k). So we encounter this issue very randomly and the leak block always 4k.

            let's disable this test until the root cause is understood. the issue looks pretty local and not affecting other tests, functionality.

            bzzz Alex Zhuravlev added a comment - let's disable this test until the root cause is understood. the issue looks pretty local and not affecting other tests, functionality.
            ian Ian Colle (Inactive) added a comment - https://maloo.whamcloud.com/test_sets/a22a10ee-07df-11e2-9e76-52540035b04c
            liwei Li Wei (Inactive) added a comment - https://maloo.whamcloud.com/test_sets/abc64ce8-060f-11e2-9b17-52540035b04c This was master with OFD and LDiskFS OSTs.
            liwei Li Wei (Inactive) added a comment - - edited

            Liu Xuezhao,

            Yes, I think the extra wait_delete_completed() should help reduce the failure rate. That change is already included in http://review.whamcloud.com/2982, which hopefully could land soon.

            liwei Li Wei (Inactive) added a comment - - edited Liu Xuezhao, Yes, I think the extra wait_delete_completed() should help reduce the failure rate. That change is already included in http://review.whamcloud.com/2982 , which hopefully could land soon.

            People

              ys Yang Sheng
              xuezhao Xuezhao Liu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: