[LU-9891] replay-ost-single test_7: 15995648 > 15995136 + logsize 400 Created: 18/Aug/17 Updated: 12/Apr/18 Resolved: 12/Apr/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.11.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Bob Glossman (Inactive) | Assignee: | James Nunez (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||
| Description |
|
This issue was created by maloo for Bob Glossman <bob.glossman@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/9e03462a-83bd-11e7-9952-5254006e85c2. The sub-test test_7 failed with the following error: 15995648 > 15995136 + logsize 400 Info required for matching: replay-ost-single 7 |
| Comments |
| Comment by James Nunez (Inactive) [ 21/Aug/17 ] |
|
On master, replay-ost-single test_7 failed, with this error, once in June and has so far failed six times since August 4 and is only failing in for ZFS. |
| Comment by Andreas Dilger [ 21/Aug/17 ] |
|
The amount of space released by ZFS is not 100% deterministic due to COW and snapshots, as seen by patches in |
| Comment by Gerrit Updater [ 24/Aug/17 ] |
|
James Nunez (james.a.nunez@intel.com) uploaded a new patch: https://review.whamcloud.com/28682 |
| Comment by James Nunez (Inactive) [ 31/Aug/17 ] |
|
Note: replay-ost-single test_6 is also failing with this error message. This test started failing with this error message on August 28, 2017. Logs for some of these failures are at |
| Comment by Bob Glossman (Inactive) [ 07/Sep/17 ] |
|
another on b2_10: |
| Comment by James Nunez (Inactive) [ 07/Sep/17 ] |
|
Andreas - Looking at recent failures for replay-ost-single tests 6 and 7, we see that the space not released by ZFS is between 512 - 1408 KB. For example, see the following logs for ZFS not releasing 1408 KB https://testing.hpdd.intel.com/test_sets/27a01d7e-91e5-11e7-b67f-5254006e85c2 and https://testing.hpdd.intel.com/test_sets/dc64c2f0-8d03-11e7-b50a-5254006e85c2. Is ZFS not releasing over 1MB expected and not a problem? If so, then maybe we should increase the return value of fs_log_size() for ZFS to be 1500? |
| Comment by Andreas Dilger [ 08/Sep/17 ] |
|
The problem with ZFS is that it internally saves 3-4 snapshots of the filesystem for recovery purposes, in case of disk corruption after a crash (e.g. Disks that lie about sync actually getting everything safe in disk). This means that space that is deleted may take 3-4 ZFS transaction commits before deleted files actually release thei space. |
| Comment by Bob Glossman (Inactive) [ 10/Sep/17 ] |
|
another on master: |
| Comment by Gerrit Updater [ 13/Sep/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/28682/ |
| Comment by Peter Jones [ 13/Sep/17 ] |
|
Landed for 2.11 |
| Comment by Sebastien Buisson (Inactive) [ 14/Sep/17 ] |
|
Hi, I think I hit this issue on master after this patch landed: Correct me if I am wrong. |
| Comment by James Nunez (Inactive) [ 14/Sep/17 ] |
|
This issue is still seen during testing. Unfortunately, the ZFS log size can be greater than the size of the buffer; see Andreas' comments above. Maybe for ZFS this error should be ignored or the buffer size be increased (a lot)? |
| Comment by Bob Glossman (Inactive) [ 21/Sep/17 ] |
|
another on b2_10: |
| Comment by Sebastien Buisson (Inactive) [ 19/Oct/17 ] |
|
Hit on master: |
| Comment by nasf (Inactive) [ 20/Nov/17 ] |
|
+1 on master: |
| Comment by Jinshan Xiong (Inactive) [ 25/Nov/17 ] |
|
https://testing.hpdd.intel.com/test_sets/21716306-d1b2-11e7-a066-52540065bddc |
| Comment by Bob Glossman (Inactive) [ 12/Jan/18 ] |
|
more on master: |
| Comment by James Nunez (Inactive) [ 21/Jan/18 ] |
|
Patch https://review.whamcloud.com/#/c/30916/ should fix this failure. |
| Comment by Andreas Dilger [ 21/Jan/18 ] |
|
James, any chance you could update that patch per review comments? |
| Comment by Jinshan Xiong (Inactive) [ 23/Jan/18 ] |
|
https://testing.hpdd.intel.com/test_sets/0a1d113e-ffd2-11e7-a10a-52540065bddc |
| Comment by Minh Diep [ 26/Feb/18 ] |
|
+1 on b2_10 https://testing.hpdd.intel.com/test_sets/b586c42e-18f1-11e8-a7cd-52540065bddc |
| Comment by Bob Glossman (Inactive) [ 28/Feb/18 ] |
|
another on b2_10: |
| Comment by James Nunez (Inactive) [ 07/Mar/18 ] |
|
The last time replay-ost-single test 7 failed testing was on 2018-01-24. It looks like this issue is fixed on the master (pre-2.11) branch. |
| Comment by Peter Jones [ 12/Apr/18 ] |
|
Believed to be a duplicate of |