Details

    • Technical task
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • None
    • None
    • 8904

    Description

      When one MDT restarts after a crash, it will process all of the records in its local update llog. It will batch up all of the updates with the same ur_master_index, ur_batchid and sends them in an OUT_UPDATE RPC to each of the remote targets that were part of the operation. At the mean time, all other MDTs will be notified, and they will also check their own local update log, and all of the related records will be sent to the failover MDT.

      The MDT who receives the updates from other MDT, will check whether the corresponding updates are already recorded in their local update llog.
      If the update was already committed, then the MDT will reply with an arbitrary pb_transno < pb_last_committed.

      If the updates do not exist in the update llog, they will compare the master transno in the update record with the transno in the last_rcvd, if the transno in update record is smaller than the one in the last_rcvd, it means the master already sent the update to this MDT, and the update is already being exected and committed, and the update log has been deleted, so it will also return an arbitrary smaller transno as above. If the transno in the update record is larger, it will replay the update with a new transno.

      In all of cases, the MDT will reply to the sender with the transno.
      If the sender is the recovering MDT, which is the master for this operation, it will build the in-memory operation state to track the remote updates, and when all of the remote updates have committed, it can cancel the local update record.
      Then client will send replay/resend request to the failover MDT,

      The master MDT will check whether the request exists in the update log by the request xid.

      If it does not exist, it will compare the request transno with its own transno, only replay the request if its transno is bigger than the last transno(lcd_last_transno) of this MDT.

      If it does exist, it means the recovery between MDTs already handle this case. So it will return an arbitrary smaller transno, then client can remove the request from the replay list.
      If there are any failures during the above 2 steps, lfsck daemon will be triggered to fix the filesystem.

      For more details, please refer to the HLD for DNE phase II.

      Attachments

        Activity

          [LU-3540] recovery for cross-MDT operation
          di.wang Di Wang added a comment -

          patches landed to master

          di.wang Di Wang added a comment - patches landed to master
          di.wang Di Wang added a comment -

          Oh, this part has been landed.

          di.wang Di Wang added a comment - Oh, this part has been landed.

          Is their work left?

          simmonsja James A Simmons added a comment - Is their work left?

          Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11737/
          Subject: LU-3540 lod: update recovery thread
          Project: fs/lustre-release
          Branch: master
          Current Patch Set:
          Commit: 4f53536d002c13886210b672b657795baa067144

          gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11737/ Subject: LU-3540 lod: update recovery thread Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4f53536d002c13886210b672b657795baa067144

          People

            di.wang Di Wang
            di.wang Di Wang
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: