[LU-14201] replay-single test 89 fails with '3072 blocks leaked' Created: 08/Dec/20 Updated: 22/Nov/22 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.12.6 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | James Nunez (Inactive) | Assignee: | WC Triage |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
ZFS |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
replay-single test_89 fails with '3072 blocks leaked'. We’ve seen this test fail with this error message, For the test failure at https://testing.whamcloud.com/test_sets/a6260ca9-b7a0-4818-9d48-ab79249ba526, the last lines in the suite_log are Waiting for orphan cleanup... CMD: trevis-20vm4 /usr/sbin/lctl list_param osp.*osc*.old_sync_processed 2> /dev/null osp.lustre-OST0000-osc-MDT0000.old_sync_processed osp.lustre-OST0001-osc-MDT0000.old_sync_processed osp.lustre-OST0002-osc-MDT0000.old_sync_processed osp.lustre-OST0003-osc-MDT0000.old_sync_processed osp.lustre-OST0004-osc-MDT0000.old_sync_processed osp.lustre-OST0005-osc-MDT0000.old_sync_processed osp.lustre-OST0006-osc-MDT0000.old_sync_processed wait 40 secs maximumly for trevis-20vm4 mds-ost sync done. CMD: trevis-20vm4 /usr/sbin/lctl get_param -n osp.*osc*.old_sync_processed sleep 5 for ZFS zfs Waiting for local destroys to complete replay-single test_89: @@@@@@ FAIL: 3072 blocks leaked Trace dump: = /usr/lib64/lustre/tests/test-framework.sh:5907:error() = /usr/lib64/lustre/tests/replay-single.sh:3329:test_89() There is nothing obviously wrong in the console logs. |
| Comments |
| Comment by Andreas Dilger [ 09/Dec/20 ] |
|
I had a quick look at this, and so far it is a one-off test failure. There was one other test_89 failure in the past month, but it looked quite different. This test is verifying that if a file is deleted across both an OSS and MDS restart that the space on the OSTs is released. In terms of severity, this is fairly low, in the sense that concurrent MDS and OSS failure is fairly rare, while also deleting files. At worst, some space on the OST would be leaked. It may also be that this is a test script issue (e.g. the delete didn't happen yet because "wait_delete_completed_mds()" didn't wait long enough). So I don't think it is a blocker for the 2.12.6 release, but we can keep an eye on whether it is being hit regularly. |
| Comment by Andreas Dilger [ 22/Nov/22 ] |
|
Duplicate with |