Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7117

replay-single test_70d: timeout and mkdir/rmdir stopped

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.9.0
    • Lustre 2.8.0, Lustre 2.9.0
    • None
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for Andreas Dilger <andreas.dilger@intel.com>

      This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/56e1b56e-53ff-11e5-8f2c-5254006e85c2.

      The sub-test test_70d failed with the following error:

      error on LL_IOC_LMV_SETSTRIPE '/mnt/lustre/d70d.replay-single/test1' (3): stripe already set
      error: mkdir: create stripe dir '/mnt/lustre/d70d.replay-single/test1' failed
      mkdir fails
      /usr/lib64/lustre/tests/replay-single.sh: line 2236: kill: (25189) - No such process
      mkdir/rmdir 25189 stopped
      

      There are several test failures and timeouts for 70d since 2015-09-02 so I suspect a patch landed on that day or the previous day that introduced a regression.

      Info required for matching: replay-single 70d

      Attachments

        Issue Links

          Activity

            [LU-7117] replay-single test_70d: timeout and mkdir/rmdir stopped

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20940/
            Subject: LU-7117 osp: set ptlrpc_request::rq_allow_replay properly
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: e3d507eec50fc1ff79acf2a9f93d52d698c887d7

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/20940/ Subject: LU-7117 osp: set ptlrpc_request::rq_allow_replay properly Project: fs/lustre-release Branch: master Current Patch Set: Commit: e3d507eec50fc1ff79acf2a9f93d52d698c887d7
            adilger Andreas Dilger added a comment - - edited

            Never mind, I see that http://review.whamcloud.com/21064 "LU-7117 mdt: mdt unlink should lock before lookup" is abandoned since it was landed as http://review.whamcloud.com/21088 via LU-8353.

            adilger Andreas Dilger added a comment - - edited Never mind, I see that http://review.whamcloud.com/21064 " LU-7117 mdt: mdt unlink should lock before lookup" is abandoned since it was landed as http://review.whamcloud.com/21088 via LU-8353 .

            How do the patches http://review.whamcloud.com/20940 "LU-7117 osp: control RPC to be sent when recovery" and http://review.whamcloud.com/21064 "LU-7117 mdt: mdt unlink should lock before lookup" relate to each other? Are they both needed? Are they different approaches to fixing the same problem, and only one is needed?

            adilger Andreas Dilger added a comment - How do the patches http://review.whamcloud.com/20940 " LU-7117 osp: control RPC to be sent when recovery" and http://review.whamcloud.com/21064 " LU-7117 mdt: mdt unlink should lock before lookup" relate to each other? Are they both needed? Are they different approaches to fixing the same problem, and only one is needed?
            sbuisson Sebastien Buisson (Inactive) added a comment - A lot more occurrences recently, like this one on master: https://testing.hpdd.intel.com/test_sets/4d6f9aea-4a09-11e6-8968-5254006e85c2
            bfaccini Bruno Faccini (Inactive) added a comment - +1 on master at https://testing.hpdd.intel.com/test_sets/79710324-49a3-11e6-a80f-5254006e85c2
            sbuisson Sebastien Buisson (Inactive) added a comment - Another occurrence: https://testing.hpdd.intel.com/test_sets/823971fa-433e-11e6-acf3-5254006e85c2

            Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/21064
            Subject: LU-7117 mdt: mdt unlink should lock before lookup
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0ae3d50ed2745366f677b79586f3fe72645330ee

            gerrit Gerrit Updater added a comment - Lai Siyao (lai.siyao@intel.com) uploaded a new patch: http://review.whamcloud.com/21064 Subject: LU-7117 mdt: mdt unlink should lock before lookup Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0ae3d50ed2745366f677b79586f3fe72645330ee
            bogl Bob Glossman (Inactive) added a comment - another on master: https://testing.hpdd.intel.com/test_sets/6a54094a-3a9e-11e6-a0ce-5254006e85c2

            no new RPCS (except update log redo) should be sent until the recovery is completed?

            The MDT0 is in recovery, but MDT1 is normal, so the RPC from the client to MDT1 is not blocked.

            yong.fan nasf (Inactive) added a comment - no new RPCS (except update log redo) should be sent until the recovery is completed? The MDT0 is in recovery, but MDT1 is normal, so the RPC from the client to MDT1 is not blocked.

            Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/20940
            Subject: LU-7117 osp: control RPC to be sent when recovery
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 5fb68c49af3fa08189110a4e980ad792efd7128b

            gerrit Gerrit Updater added a comment - Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/20940 Subject: LU-7117 osp: control RPC to be sent when recovery Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 5fb68c49af3fa08189110a4e980ad792efd7128b

            no new RPCS (except update log redo) should be sent until the recovery is completed?

            bzzz Alex Zhuravlev added a comment - no new RPCS (except update log redo) should be sent until the recovery is completed?

            People

              laisiyao Lai Siyao
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: