Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7400

top_trans_create() followed by top_trans_stop() get stuck

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.8.0
    • Lustre 2.8.0
    • None
    • 3
    • 9223372036854775807

    Description

      as sub_thandle_register_stop_cb() is called in top_trans_start(), then missing top_trans_start() (which is valid case) cause top_trans_stop() to wait indefinitely for a missing stop callbacks.

      Attachments

        Activity

          [LU-7400] top_trans_create() followed by top_trans_stop() get stuck

          Landed for 2.8

          jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17059/
          Subject: LU-7400 lod: register stop callbacks at create
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 082eabdeaa0c2a0f536accf7028e5ab5061c2c46

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/17059/ Subject: LU-7400 lod: register stop callbacks at create Project: fs/lustre-release Branch: master Current Patch Set: Commit: 082eabdeaa0c2a0f536accf7028e5ab5061c2c46
          di.wang Di Wang added a comment -

          Since we saw this in soak-test DNE, and we have to pass it before release, so let's make it blocker for now.

          di.wang Di Wang added a comment - Since we saw this in soak-test DNE, and we have to pass it before release, so let's make it blocker for now.
          di.wang Di Wang added a comment -

          I think we also have this issue for commit callback, which I found in failover soak-test. I will update the patch to resolve them together. IMHO, this should get into 2.8, since it will cause endless recovery.

          di.wang Di Wang added a comment - I think we also have this issue for commit callback, which I found in failover soak-test. I will update the patch to resolve them together. IMHO, this should get into 2.8, since it will cause endless recovery.

          for example, if a target fails during recovery and some of preparation RPC (e.g. fetching EA) returns an error, then the original migration process got stuck. even if the failed target is back, it's still stuck. I think it makes sense to consider landing.

          bzzz Alex Zhuravlev added a comment - for example, if a target fails during recovery and some of preparation RPC (e.g. fetching EA) returns an error, then the original migration process got stuck. even if the failed target is back, it's still stuck. I think it makes sense to consider landing.

          Alex, under what kind of workload is this bug hit, and how easily does that happen? Is this a patch that needs to be landed for 2.8.0?

          adilger Andreas Dilger added a comment - Alex, under what kind of workload is this bug hit, and how easily does that happen? Is this a patch that needs to be landed for 2.8.0?

          Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/17059
          Subject: LU-7400 lod: register commit callbacks at create
          Project: fs/lustre-release
          Branch: master
          Current Patch Set: 1
          Commit: 58a978451fd7504ecc9ab3bbf841a3499401f0b2

          gerrit Gerrit Updater added a comment - Alex Zhuravlev (alexey.zhuravlev@intel.com) uploaded a new patch: http://review.whamcloud.com/17059 Subject: LU-7400 lod: register commit callbacks at create Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 58a978451fd7504ecc9ab3bbf841a3499401f0b2

          People

            bzzz Alex Zhuravlev
            bzzz Alex Zhuravlev
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: