Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-6374

replay-single test_20b: after 44416 > before 6528

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • Lustre 2.7.0, Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Bob Glossman <bob.glossman@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/77b52408-ccc1-11e4-a8ca-5254006e85c2.

      This looks similar to LU-3455 but that is marked Resolved.
      It may be zfs only. This instance was seen during review-zfs on master.

      The sub-test test_20b failed with the following error:

      after 44416 > before 6528
      

      Please provide additional information about the failure here.

      Info required for matching: replay-single 20b

      Attachments

        Activity

          [LU-6374] replay-single test_20b: after 44416 > before 6528
          mdiep Minh Diep added a comment -

          Landed in Lustre 2.10.0

          mdiep Minh Diep added a comment - Landed in Lustre 2.10.0

          Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23688/
          Subject: LU-6374 tests: wait for zfs commit
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 6f680d9eef683b83b478ff2aaf281d15f7c78fa2

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23688/ Subject: LU-6374 tests: wait for zfs commit Project: fs/lustre-release Branch: master Current Patch Set: Commit: 6f680d9eef683b83b478ff2aaf281d15f7c78fa2

          The failure can be easily reproduced locally with zfs backend, and with above patch applied, I can't reproduce it anymore.

          niu Niu Yawei (Inactive) added a comment - The failure can be easily reproduced locally with zfs backend, and with above patch applied, I can't reproduce it anymore.

          Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/23688
          Subject: LU-6374 tests: wait for zfs commit
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 19ad6284c99469aa193760228b6830e5d221a553

          gerrit Gerrit Updater added a comment - Niu Yawei (yawei.niu@intel.com) uploaded a new patch: http://review.whamcloud.com/23688 Subject: LU-6374 tests: wait for zfs commit Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 19ad6284c99469aa193760228b6830e5d221a553

          For zfs, we need wait for commit to release space, but looks wait_delete_completed_mds() didn't wait at all:

          wait_delete_completed_mds() {
                  local MAX_WAIT=${1:-20}
                  # for ZFS, waiting more time for DMUs to be committed
                  local ZFS_WAIT=${2:-5}
                  local mds2sync=""
                  local stime=$(date +%s)
                  local etime
                  local node
                  local changes
          
                  # find MDS with pending deletions
                  for node in $(mdts_nodes); do
                          changes=$(do_node $node "$LCTL get_param -n osc.*MDT*.sync_*" \
                                  2>/dev/null | calc_sum)
                          if [[ $changes -eq 0 ]]; then
                                  continue
                          fi
                          mds2sync="$mds2sync $node"
                  done
                  if [ -z "$mds2sync" ]; then
                          return   <------------- before this return, we need to wait for zfs commit
                  fi
          
          niu Niu Yawei (Inactive) added a comment - For zfs, we need wait for commit to release space, but looks wait_delete_completed_mds() didn't wait at all: wait_delete_completed_mds() { local MAX_WAIT=${1:-20} # for ZFS, waiting more time for DMUs to be committed local ZFS_WAIT=${2:-5} local mds2sync="" local stime=$(date +%s) local etime local node local changes # find MDS with pending deletions for node in $(mdts_nodes); do changes=$(do_node $node "$LCTL get_param -n osc.*MDT*.sync_*" \ 2>/dev/ null | calc_sum) if [[ $changes -eq 0 ]]; then continue fi mds2sync= "$mds2sync $node" done if [ -z "$mds2sync" ]; then return <------------- before this return , we need to wait for zfs commit fi
          niu Niu Yawei (Inactive) added a comment - Hit on master: https://testing.hpdd.intel.com/test_sets/ed824568-a662-11e6-a6e7-5254006e85c2
          yujian Jian Yu added a comment - More instance on master branch: https://testing.hpdd.intel.com/test_sets/2a69c5be-aaba-11e5-9fbe-5254006e85c2
          jamesanunez James Nunez (Inactive) added a comment - - edited More instances on master, all ZFS: 2015-11-04 21:19:27 - https://testing.hpdd.intel.com/test_sets/87ec5f46-8376-11e5-8df7-5254006e85c2 2015-11-21 15:51:23 - https://testing.hpdd.intel.com/test_sets/bf7d6960-90a4-11e5-b9af-5254006e85c2 2015-12-09 00:09:30 - https://testing.hpdd.intel.com/test_sets/50aeefd8-9e44-11e5-86f6-5254006e85c2 2016-02-21 14:54:24 - https://testing.hpdd.intel.com/test_sets/c7a4e68a-d8f3-11e5-83e2-5254006e85c2
          adilger Andreas Dilger added a comment - Also https://testing.hpdd.intel.com/test_sets/08cba93c-3db3-11e5-9e7f-5254006e85c2 on master.
          jhammond John Hammond added a comment - Another on 2.7.55+ https://testing.hpdd.intel.com/test_sets/4323393a-250b-11e5-8427-5254006e85c2 .

          People

            niu Niu Yawei (Inactive)
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: