[LU-9891] replay-ost-single test_7: 15995648 > 15995136 + logsize 400 Created: 18/Aug/17  Updated: 12/Apr/18  Resolved: 12/Apr/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.11.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Bob Glossman (Inactive) Assignee: James Nunez (Inactive)
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-5761 replay-single test_89: @@@@@@ FAIL: 2... Resolved
is duplicated by LU-3144 failover: replay-ost-single test_6: F... Resolved
Related
is related to LU-10052 replay-single test_20b fails with 'af... Resolved
is related to LU-8672 missing error handling in replay-sing... Resolved
is related to LU-10352 replay-ost-single test_7: 15995648 > ... Open
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/9e03462a-83bd-11e7-9952-5254006e85c2.

The sub-test test_7 failed with the following error:

15995648 > 15995136 + logsize 400

Info required for matching: replay-ost-single 7



 Comments   
Comment by James Nunez (Inactive) [ 21/Aug/17 ]

On master, replay-ost-single test_7 failed, with this error, once in June and has so far failed six times since August 4 and is only failing in for ZFS.

Comment by Andreas Dilger [ 21/Aug/17 ]

The amount of space released by ZFS is not 100% deterministic due to COW and snapshots, as seen by patches in LU-2903. In this case, there is 512KB of space not released by ZFS, but I don't think this is a significant problem. This needs a patch to change fs_log_size() to return 512 for ZFS.

Comment by Gerrit Updater [ 24/Aug/17 ]

James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/28682
Subject: LU-9891 tests: Increase space not released for ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: bc82c481d34ead960a3161ed517c6407701990b4

Comment by James Nunez (Inactive) [ 31/Aug/17 ]

Note: replay-ost-single test_6 is also failing with this error message. This test started failing with this error message on August 28, 2017.

Logs for some of these failures are at
https://testing.hpdd.intel.com/test_sets/8cf5f5c6-8c32-11e7-b50a-5254006e85c2
https://testing.hpdd.intel.com/test_sets/72ae91b2-8dbf-11e7-b5c1-5254006e85c2

Comment by Bob Glossman (Inactive) [ 07/Sep/17 ]

another on b2_10:
https://testing.hpdd.intel.com/test_sets/884f3d08-9361-11e7-b722-5254006e85c2

Comment by James Nunez (Inactive) [ 07/Sep/17 ]

Andreas - Looking at recent failures for replay-ost-single tests 6 and 7, we see that the space not released by ZFS is between 512 - 1408 KB. For example, see the following logs for ZFS not releasing 1408 KB https://testing.hpdd.intel.com/test_sets/27a01d7e-91e5-11e7-b67f-5254006e85c2 and https://testing.hpdd.intel.com/test_sets/dc64c2f0-8d03-11e7-b50a-5254006e85c2.

Is ZFS not releasing over 1MB expected and not a problem? If so, then maybe we should increase the return value of fs_log_size() for ZFS to be 1500?

Comment by Andreas Dilger [ 08/Sep/17 ]

The problem with ZFS is that it internally saves 3-4 snapshots of the filesystem for recovery purposes, in case of disk corruption after a crash (e.g. Disks that lie about sync actually getting everything safe in disk). This means that space that is deleted may take 3-4 ZFS transaction commits before deleted files actually release thei space.

Comment by Bob Glossman (Inactive) [ 10/Sep/17 ]

another on master:
https://testing.hpdd.intel.com/test_sets/edce0d34-95bf-11e7-b75f-5254006e85c2

Comment by Gerrit Updater [ 13/Sep/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28682/
Subject: LU-9891 tests: Increase space not released for ZFS
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6faae60be03020f1e8b032c1eb93829f70d1e08d

Comment by Peter Jones [ 13/Sep/17 ]

Landed for 2.11

Comment by Sebastien Buisson (Inactive) [ 14/Sep/17 ]

Hi,

I think I hit this issue on master after this patch landed:
https://testing.hpdd.intel.com/test_sets/82efaa38-98a1-11e7-ba20-5254006e85c2

Correct me if I am wrong.

Comment by James Nunez (Inactive) [ 14/Sep/17 ]

This issue is still seen during testing. Unfortunately, the ZFS log size can be greater than the size of the buffer; see Andreas' comments above. Maybe for ZFS this error should be ignored or the buffer size be increased (a lot)?

Comment by Bob Glossman (Inactive) [ 21/Sep/17 ]

another on b2_10:
https://testing.hpdd.intel.com/test_sets/3b6205a0-9e99-11e7-b778-5254006e85c2

Comment by Sebastien Buisson (Inactive) [ 19/Oct/17 ]

Hit on master:
https://testing.hpdd.intel.com/test_sets/0d996310-b46b-11e7-a282-5254006e85c2

Comment by nasf (Inactive) [ 20/Nov/17 ]

+1 on master:
https://testing.hpdd.intel.com/test_sets/9f0eb4ca-cbeb-11e7-a066-52540065bddc

Comment by Jinshan Xiong (Inactive) [ 25/Nov/17 ]

https://testing.hpdd.intel.com/test_sets/21716306-d1b2-11e7-a066-52540065bddc

Comment by Bob Glossman (Inactive) [ 12/Jan/18 ]

more on master:
https://testing.hpdd.intel.com/test_sets/abab0d28-f42c-11e7-a169-52540065bddc
https://testing.hpdd.intel.com/test_sets/84bf9c66-f75f-11e7-a6ad-52540065bddc

Comment by James Nunez (Inactive) [ 21/Jan/18 ]

Patch https://review.whamcloud.com/#/c/30916/ should fix this failure.

Comment by Andreas Dilger [ 21/Jan/18 ]

James, any chance you could update that patch per review comments?

Comment by Jinshan Xiong (Inactive) [ 23/Jan/18 ]

https://testing.hpdd.intel.com/test_sets/0a1d113e-ffd2-11e7-a10a-52540065bddc

Comment by Minh Diep [ 26/Feb/18 ]

+1 on b2_10

https://testing.hpdd.intel.com/test_sets/b586c42e-18f1-11e8-a7cd-52540065bddc

Comment by Bob Glossman (Inactive) [ 28/Feb/18 ]

another on b2_10:
https://testing.hpdd.intel.com/test_sets/265c0004-1cd7-11e8-a6ad-52540065bddc

Comment by James Nunez (Inactive) [ 07/Mar/18 ]

The last time replay-ost-single test 7 failed testing was on 2018-01-24. It looks like this issue is fixed on the master (pre-2.11) branch.

Comment by Peter Jones [ 12/Apr/18 ]

Believed to be a duplicate of LU-5761

Generated at Sat Feb 10 02:30:13 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.