[LU-6374] replay-single test_20b: after 44416 > before 6528 Created: 17/Mar/15  Updated: 21/Dec/16  Resolved: 21/Dec/16

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.7.0, Lustre 2.8.0
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/77b52408-ccc1-11e4-a8ca-5254006e85c2.

This looks similar to LU-3455 but that is marked Resolved.
It may be zfs only. This instance was seen during review-zfs on master.

The sub-test test_20b failed with the following error:

after 44416 > before 6528

Please provide additional information about the failure here.

Info required for matching: replay-single 20b



 Comments   
Comment by John Hammond [ 08/Jul/15 ]

Another on 2.7.55+ https://testing.hpdd.intel.com/test_sets/4323393a-250b-11e5-8427-5254006e85c2.

Comment by Andreas Dilger [ 13/Aug/15 ]

Also https://testing.hpdd.intel.com/test_sets/08cba93c-3db3-11e5-9e7f-5254006e85c2 on master.

Comment by James Nunez (Inactive) [ 10/Dec/15 ]

More instances on master, all ZFS:
2015-11-04 21:19:27 - https://testing.hpdd.intel.com/test_sets/87ec5f46-8376-11e5-8df7-5254006e85c2
2015-11-21 15:51:23 - https://testing.hpdd.intel.com/test_sets/bf7d6960-90a4-11e5-b9af-5254006e85c2
2015-12-09 00:09:30 - https://testing.hpdd.intel.com/test_sets/50aeefd8-9e44-11e5-86f6-5254006e85c2
2016-02-21 14:54:24 - https://testing.hpdd.intel.com/test_sets/c7a4e68a-d8f3-11e5-83e2-5254006e85c2

Comment by Jian Yu [ 25/Dec/15 ]

More instance on master branch:
https://testing.hpdd.intel.com/test_sets/2a69c5be-aaba-11e5-9fbe-5254006e85c2

Comment by Niu Yawei (Inactive) [ 10/Nov/16 ]

Hit on master: https://testing.hpdd.intel.com/test_sets/ed824568-a662-11e6-a6e7-5254006e85c2

Comment by Niu Yawei (Inactive) [ 10/Nov/16 ]

For zfs, we need wait for commit to release space, but looks wait_delete_completed_mds() didn't wait at all:

wait_delete_completed_mds() {
        local MAX_WAIT=${1:-20}
        # for ZFS, waiting more time for DMUs to be committed
        local ZFS_WAIT=${2:-5}
        local mds2sync=""
        local stime=$(date +%s)
        local etime
        local node
        local changes

        # find MDS with pending deletions
        for node in $(mdts_nodes); do
                changes=$(do_node $node "$LCTL get_param -n osc.*MDT*.sync_*" \
                        2>/dev/null | calc_sum)
                if [[ $changes -eq 0 ]]; then
                        continue
                fi
                mds2sync="$mds2sync $node"
        done
        if [ -z "$mds2sync" ]; then
                return   <------------- before this return, we need to wait for zfs commit
        fi
Comment by Gerrit Updater [ 10/Nov/16 ]

Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/23688
Subject: LU-6374 tests: wait for zfs commit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 19ad6284c99469aa193760228b6830e5d221a553

Comment by Niu Yawei (Inactive) [ 17/Nov/16 ]

The failure can be easily reproduced locally with zfs backend, and with above patch applied, I can't reproduce it anymore.

Comment by Gerrit Updater [ 19/Dec/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23688/
Subject: LU-6374 tests: wait for zfs commit
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 6f680d9eef683b83b478ff2aaf281d15f7c78fa2

Comment by Minh Diep [ 21/Dec/16 ]

Landed in Lustre 2.10.0

Generated at Sat Feb 10 01:59:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.