[LU-3540] recovery for cross-MDT operation - Whamcloud Community JIRA

Details

Type: Technical task
Resolution: Fixed
Priority: Minor
Fix Version/s: Lustre 2.8.0
Affects Version/s: None
Labels:
None

Rank (Obsolete):
8904

Description

When one MDT restarts after a crash, it will process all of the records in its local update llog. It will batch up all of the updates with the same ur_master_index, ur_batchid and sends them in an OUT_UPDATE RPC to each of the remote targets that were part of the operation. At the mean time, all other MDTs will be notified, and they will also check their own local update log, and all of the related records will be sent to the failover MDT.

The MDT who receives the updates from other MDT, will check whether the corresponding updates are already recorded in their local update llog.
If the update was already committed, then the MDT will reply with an arbitrary pb_transno < pb_last_committed.

If the updates do not exist in the update llog, they will compare the master transno in the update record with the transno in the last_rcvd, if the transno in update record is smaller than the one in the last_rcvd, it means the master already sent the update to this MDT, and the update is already being exected and committed, and the update log has been deleted, so it will also return an arbitrary smaller transno as above. If the transno in the update record is larger, it will replay the update with a new transno.

In all of cases, the MDT will reply to the sender with the transno.
If the sender is the recovering MDT, which is the master for this operation, it will build the in-memory operation state to track the remote updates, and when all of the remote updates have committed, it can cancel the local update record.
Then client will send replay/resend request to the failover MDT,

The master MDT will check whether the request exists in the update log by the request xid.

If it does not exist, it will compare the request transno with its own transno, only replay the request if its transno is bigger than the last transno(lcd_last_transno) of this MDT.

If it does exist, it means the recovery between MDTs already handle this case. So it will return an arbitrary smaller transno, then client can remove the request from the replay list.
If there are any failures during the above 2 steps, lfsck daemon will be triggered to fix the filesystem.

For more details, please refer to the HLD for DNE phase II.

Attachments

Activity

[LU-3540] recovery for cross-MDT operation

Di Wang added a comment - 01/Jul/15 4:42 PM

patches landed to master

Di Wang added a comment - 01/Jul/15 4:42 PM patches landed to master

Di Wang added a comment - 01/Jul/15 4:42 PM

Oh, this part has been landed.

Di Wang added a comment - 01/Jul/15 4:42 PM Oh, this part has been landed.

James A Simmons added a comment - 01/Jul/15 4:23 PM

Is their work left?

James A Simmons added a comment - 01/Jul/15 4:23 PM Is their work left?

Gerrit Updater added a comment - 05/Jun/15 8:13 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11737/
Subject: ~~LU-3540~~ lod: update recovery thread
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 4f53536d002c13886210b672b657795baa067144

Gerrit Updater added a comment - 05/Jun/15 8:13 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11737/ Subject: LU-3540 lod: update recovery thread Project: fs/lustre-release Branch: master Current Patch Set: Commit: 4f53536d002c13886210b672b657795baa067144

People

Assignee:: Di Wang

Reporter:: Di Wang

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 29/Jun/13 6:12 PM

Updated:: 22/Dec/15 3:30 AM

Resolved:: 01/Jul/15 4:42 PM