[LU-4265] replay-ost-single test_6: space grew after dd (or didn't change) Created: 18/Nov/13  Updated: 29/Apr/19  Resolved: 29/Apr/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0, Lustre 2.5.3, Lustre 2.9.0, Lustre 2.12.0, Lustre 2.13.0, Lustre 2.12.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Maloo Assignee: WC Triage
Resolution: Cannot Reproduce Votes: 0
Labels: zfs

Issue Links:
Related
is related to LU-12232 replay-ost-single test 6 fails with '... Resolved
Severity: 3
Rank (Obsolete): 11718

 Description   

This issue was created by maloo for Nathaniel Clark <nathaniel.l.clark@intel.com>

This issue relates to the following test suite run:
http://maloo.whamcloud.com/test_sets/627ba06c-48b0-11e3-bdb5-52540035b04c
http://maloo.whamcloud.com/test_sets/600e6472-4f68-11e3-84d3-52540035b04c

The sub-test test_6 failed with the following error:

space grew after dd: before:77312128 after_dd:77312128

Info required for matching: replay-ost-single 6



 Comments   
Comment by Jian Yu [ 05/Sep/14 ]

While verifying patches http://review.whamcloud.com/11541, http://review.whamcloud.com/11411, http://review.whamcloud.com/9318 on Lustre b2_5 branch with FSTYPE=zfs, the same failure occurred:
https://testing.hpdd.intel.com/test_sets/160d2586-31e7-11e4-92e0-5254006e85c2
https://testing.hpdd.intel.com/test_sets/228a2e1e-27b8-11e4-893b-5254006e85c2
https://testing.hpdd.intel.com/test_sets/cb2bd86a-9a50-11e3-965c-52540035b04c

Comment by Jian Yu [ 25/Sep/14 ]

One more instance on Lustre b2_5 branch: https://testing.hpdd.intel.com/test_sets/cd0bb8ee-44dc-11e4-bb5a-5254006e85c2

Comment by nasf (Inactive) [ 27/Sep/14 ]

Another failure instance:
https://testing.hpdd.intel.com/test_sets/3e9f4124-45c3-11e4-9397-5254006e85c2

Comment by Isaac Huang (Inactive) [ 29/Oct/14 ]

Another one:
https://testing.hpdd.intel.com/test_sets/197aed5c-592e-11e4-8f95-5254006e85c2

It appeared to be related to LU-3455. The test_6 tried to get before=$(kbytesfree) after file removal and some wait:

    rm -f $f
    sync && sleep 5 && sync  # wait for delete thread

    # wait till space is returned, following
    # (( $before > $after_dd)) test counting on that
    wait_mds_ost_sync || return 4
    wait_destroy_complete || return 5

    local before=$(kbytesfree)

I'd doubt the wait can work reliably with ZFS - sometimes frees can be delayed for a few transaction groups' time. It seemed inherently unreliable to free and wait and get free space for ZFS.

Comment by Jian Yu [ 08/Dec/14 ]

More failure instance on Lustre b2_5 branch:
https://testing.hpdd.intel.com/test_sets/cdfb95ec-7dd7-11e4-aa98-5254006e85c2

Comment by Johann Lombardi (Inactive) [ 13/Jan/15 ]

Another instance on master:
https://testing.hpdd.intel.com/test_sets/b07629a6-86a6-11e4-b678-5254006e85c2

Comment by Bob Glossman (Inactive) [ 19/Jun/15 ]

another on master:
https://testing.hpdd.intel.com/test_sets/92cbbd1a-16d1-11e5-8436-5254006e85c2

Comment by James Nunez (Inactive) [ 15/Jul/15 ]

Another two failures on master in review-zfs-part-2:
2015-07-03 07:45:48 - https://testing.hpdd.intel.com/test_sets/ceba821c-2161-11e5-a979-5254006e85c2
2015-07-14 21:23:00 - https://testing.hpdd.intel.com/test_sets/234cf58c-2a7a-11e5-96c0-5254006e85c2

Comment by Jian Yu [ 30/Jul/15 ]

More failure instance on master branch:
https://testing.hpdd.intel.com/test_sets/94e59696-3640-11e5-84a9-5254006e85c2

Comment by James Nunez (Inactive) [ 27/Dec/15 ]

Another failure on master:
2015-12-26 18:38:36 - https://testing.hpdd.intel.com/test_sets/cc75f8cc-ac0a-11e5-8114-5254006e85c2

Comment by James Nunez (Inactive) [ 24/Apr/19 ]

It looks like this test is failing again/still. I’ve gone back to January 2018 and looks at all the times replay-ost-single test 6 failed with the ''space grew after dd:” error and we’ve seen six failures all for ZFS testing:
2018-08-17 - 2.11.53.69 - https://testing.whamcloud.com/test_sets/24be3e5c-a273-11e8-8853-52540065bddc
2019-02-26 – 2.12.0.1 - https://testing.whamcloud.com/test_sets/60e5eab4-3964-11e9-8f69-52540065bddc
2019-04-17 – 2.12.52.85 - https://testing.whamcloud.com/test_sets/4fd6a78e-6170-11e9-9720-52540065bddc
2019-04-19 – 2.12.52.89 - https://testing.whamcloud.com/test_sets/043e5b44-62fd-11e9-aeec-52540065bddc
2019-04-23 – 2.12.52.95 - https://testing.whamcloud.com/test_sets/bf51ebce-65f7-11e9-a6f9-52540065bddc
2019-04-23 – 2.12.1 RC1 - https://testing.whamcloud.com/test_sets/d53ee48a-665d-11e9-8bb1-52540065bddc

Some of the failures look questionable or, possibly, the error message should be modified. For example, when “before” and “after” are the same value:
''space grew after dd: before:13442048 after_dd:13442048''

Comment by Andreas Dilger [ 29/Apr/19 ]

This stopped being hit in 2015.

Generated at Sat Feb 10 01:41:09 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.