Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1867

replay-single test_89: @@@@@@ FAIL: 4 blocks leaked

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.4.0
    • Lustre 2.4.0
    • None
    • 3
    • 4419

    Description

      Hit this problem on Maloo test on latest master branch:
      https://maloo.whamcloud.com/test_sets/07148716-fae1-11e1-a03c-52540035b04c

      It is similiar with ORI-412 reported on Orion.

      Test logs of test_89 attached.

      Attachments

        Issue Links

          Activity

            [LU-1867] replay-single test_89: @@@@@@ FAIL: 4 blocks leaked

            Jinshan, I strongly suspect that if this ancient issue is being seen again that it is a new issue that needs a new Jira ticket, even if the error message is the same. It is also most likely that the problem is FLR related, since this hasn't been reported in over 3 years.

            adilger Andreas Dilger added a comment - Jinshan, I strongly suspect that if this ancient issue is being seen again that it is a new issue that needs a new Jira ticket, even if the error message is the same. It is also most likely that the problem is FLR related, since this hasn't been reported in over 3 years.
            jay Jinshan Xiong (Inactive) added a comment - This problem is seeing again at: https://testing.hpdd.intel.com/test_sets/a6ab66ac-d1ad-11e7-9c63-52540065bddc

            Note that the ORI project is closed and those tickets cannot be used to land patches on master. I opened LU-5761 for tracking the ZFS issue.

            adilger Andreas Dilger added a comment - Note that the ORI project is closed and those tickets cannot be used to land patches on master. I opened LU-5761 for tracking the ZFS issue.
            ys Yang Sheng added a comment -

            I think this ticket is for ldiskfs issue. ZFS has similar issue but it shows more blocks leak and OR-412 is more proper to handle it. So i close this one first.

            ys Yang Sheng added a comment - I think this ticket is for ldiskfs issue. ZFS has similar issue but it shows more blocks leak and OR-412 is more proper to handle it. So i close this one first.

            This is still failing for ZFS so the above patch only re-enables test_89 for ldiskfs:
            https://testing.hpdd.intel.com/test_sets/f4f00d1a-4ffe-11e4-8734-5254006e85c2

            adilger Andreas Dilger added a comment - This is still failing for ZFS so the above patch only re-enables test_89 for ldiskfs: https://testing.hpdd.intel.com/test_sets/f4f00d1a-4ffe-11e4-8734-5254006e85c2
            ys Yang Sheng added a comment -

            re-enable test patch: http://review.whamcloud.com/12227

            ys Yang Sheng added a comment - re-enable test patch: http://review.whamcloud.com/12227

            replay-single test_89 is still being skipped on ZFS due to this bug. It looks like the landed patch may resolve the test failure, so a patch to re-enable it should be submitted.

            adilger Andreas Dilger added a comment - replay-single test_89 is still being skipped on ZFS due to this bug. It looks like the landed patch may resolve the test failure, so a patch to re-enable it should be submitted.
            ys Yang Sheng added a comment -

            Patch landed. Close bug.

            ys Yang Sheng added a comment - Patch landed. Close bug.
            ys Yang Sheng added a comment - Patch commit to: http://review.whamcloud.com/#change,4130

            In this case, the test pass condition should be changed to allow 4 blocks (16kB) difference between BLOCKS2 and BLOCKS1 and still pass, along with a comment explaining this. I guess this doesn't explain the "1536 blocks leaked" problem seen in other test failures.

            adilger Andreas Dilger added a comment - In this case, the test pass condition should be changed to allow 4 blocks (16kB) difference between BLOCKS2 and BLOCKS1 and still pass, along with a comment explaining this. I guess this doesn't explain the "1536 blocks leaked" problem seen in other test failures.
            ys Yang Sheng added a comment -

            I have doing some investigate as below. This issue caused by config-llog data not sync between mgs & ost. We count free block first as BLOCK1. Then write data to OST...etc. There have some data wrote in OST of config-data on MGS, but not in OST. Then ost umount, And the config-data will sync when ost remount. There we count free block as BLOCK2. So the BLOCK2 - BLOCK1 is the config-data changes. It may or may not cause a new block be allocated(4k). So we encounter this issue very randomly and the leak block always 4k.

            ys Yang Sheng added a comment - I have doing some investigate as below. This issue caused by config-llog data not sync between mgs & ost. We count free block first as BLOCK1. Then write data to OST...etc. There have some data wrote in OST of config-data on MGS, but not in OST. Then ost umount, And the config-data will sync when ost remount. There we count free block as BLOCK2. So the BLOCK2 - BLOCK1 is the config-data changes. It may or may not cause a new block be allocated(4k). So we encounter this issue very randomly and the leak block always 4k.

            People

              ys Yang Sheng
              xuezhao Xuezhao Liu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: