[LU-1867] replay-single test_89: @@@@@@ FAIL: 4 blocks leaked - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.4.0
Affects Version/s: Lustre 2.4.0
Labels:
None

Severity:
3
Rank (Obsolete):
4419

Description

Hit this problem on Maloo test on latest master branch:
https://maloo.whamcloud.com/test_sets/07148716-fae1-11e1-a03c-52540035b04c

It is similiar with ORI-412 reported on Orion.

Test logs of test_89 attached.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

replay-single.test_89.rar
6.81 MB
10/Sep/12 12:02 AM

Issue Links

is related to

LU-5761 replay-single test_89: @@@@@@ FAIL: 2560 blocks leaked

Resolved

Activity

[LU-1867] replay-single test_89: @@@@@@ FAIL: 4 blocks leaked

Andreas Dilger added a comment - 26/Nov/17 12:18 AM

Jinshan, I strongly suspect that if this ancient issue is being seen again that it is a new issue that needs a new Jira ticket, even if the error message is the same. It is also most likely that the problem is FLR related, since this hasn't been reported in over 3 years.

Andreas Dilger added a comment - 26/Nov/17 12:18 AM Jinshan, I strongly suspect that if this ancient issue is being seen again that it is a new issue that needs a new Jira ticket, even if the error message is the same. It is also most likely that the problem is FLR related, since this hasn't been reported in over 3 years.

Jinshan Xiong (Inactive) added a comment - 25/Nov/17 6:17 PM

This problem is seeing again at:

https://testing.hpdd.intel.com/test_sets/a6ab66ac-d1ad-11e7-9c63-52540065bddc

Jinshan Xiong (Inactive) added a comment - 25/Nov/17 6:17 PM This problem is seeing again at: https://testing.hpdd.intel.com/test_sets/a6ab66ac-d1ad-11e7-9c63-52540065bddc

Andreas Dilger added a comment - 17/Oct/14 1:02 AM

Note that the ORI project is closed and those tickets cannot be used to land patches on master. I opened ~~LU-5761~~ for tracking the ZFS issue.

Andreas Dilger added a comment - 17/Oct/14 1:02 AM Note that the ORI project is closed and those tickets cannot be used to land patches on master. I opened LU-5761 for tracking the ZFS issue.

Yang Sheng added a comment - 16/Oct/14 7:15 AM

I think this ticket is for ldiskfs issue. ZFS has similar issue but it shows more blocks leak and OR-412 is more proper to handle it. So i close this one first.

Yang Sheng added a comment - 16/Oct/14 7:15 AM I think this ticket is for ldiskfs issue. ZFS has similar issue but it shows more blocks leak and OR-412 is more proper to handle it. So i close this one first.

Andreas Dilger added a comment - 15/Oct/14 5:16 AM

This is still failing for ZFS so the above patch only re-enables test_89 for ldiskfs:
https://testing.hpdd.intel.com/test_sets/f4f00d1a-4ffe-11e4-8734-5254006e85c2

Andreas Dilger added a comment - 15/Oct/14 5:16 AM This is still failing for ZFS so the above patch only re-enables test_89 for ldiskfs: https://testing.hpdd.intel.com/test_sets/f4f00d1a-4ffe-11e4-8734-5254006e85c2

Yang Sheng added a comment - 08/Oct/14 5:36 AM

re-enable test patch: http://review.whamcloud.com/12227

Yang Sheng added a comment - 08/Oct/14 5:36 AM re-enable test patch: http://review.whamcloud.com/12227

Andreas Dilger added a comment - 01/Oct/14 7:02 AM

replay-single test_89 is still being skipped on ZFS due to this bug. It looks like the landed patch may resolve the test failure, so a patch to re-enable it should be submitted.

Andreas Dilger added a comment - 01/Oct/14 7:02 AM replay-single test_89 is still being skipped on ZFS due to this bug. It looks like the landed patch may resolve the test failure, so a patch to re-enable it should be submitted.

Yang Sheng added a comment - 01/Oct/12 12:16 PM

Patch landed. Close bug.

Yang Sheng added a comment - 01/Oct/12 12:16 PM Patch landed. Close bug.

Yang Sheng added a comment - 29/Sep/12 6:04 AM

Patch commit to: http://review.whamcloud.com/#change,4130

Yang Sheng added a comment - 29/Sep/12 6:04 AM Patch commit to: http://review.whamcloud.com/#change,4130

Andreas Dilger added a comment - 28/Sep/12 7:17 PM

In this case, the test pass condition should be changed to allow 4 blocks (16kB) difference between BLOCKS2 and BLOCKS1 and still pass, along with a comment explaining this. I guess this doesn't explain the "1536 blocks leaked" problem seen in other test failures.

Andreas Dilger added a comment - 28/Sep/12 7:17 PM In this case, the test pass condition should be changed to allow 4 blocks (16kB) difference between BLOCKS2 and BLOCKS1 and still pass, along with a comment explaining this. I guess this doesn't explain the "1536 blocks leaked" problem seen in other test failures.

Yang Sheng added a comment - 28/Sep/12 4:26 PM

I have doing some investigate as below. This issue caused by config-llog data not sync between mgs & ost. We count free block first as BLOCK1. Then write data to OST...etc. There have some data wrote in OST of config-data on MGS, but not in OST. Then ost umount, And the config-data will sync when ost remount. There we count free block as BLOCK2. So the BLOCK2 - BLOCK1 is the config-data changes. It may or may not cause a new block be allocated(4k). So we encounter this issue very randomly and the leak block always 4k.

Yang Sheng added a comment - 28/Sep/12 4:26 PM I have doing some investigate as below. This issue caused by config-llog data not sync between mgs & ost. We count free block first as BLOCK1. Then write data to OST...etc. There have some data wrote in OST of config-data on MGS, but not in OST. Then ost umount, And the config-data will sync when ost remount. There we count free block as BLOCK2. So the BLOCK2 - BLOCK1 is the config-data changes. It may or may not cause a new block be allocated(4k). So we encounter this issue very randomly and the leak block always 4k.

People

Assignee:: Yang Sheng

Reporter:: Xuezhao Liu

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 10/Sep/12 12:02 AM

Updated:: 28/Nov/17 10:08 PM

Resolved:: 16/Oct/14 7:15 AM