async update cross-MDTs
(LU-3534)
|
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.8.0 |
| Type: | Technical task | Priority: | Minor |
| Reporter: | Di Wang | Assignee: | Di Wang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Rank (Obsolete): | 8904 |
| Description |
|
When one MDT restarts after a crash, it will process all of the records in its local update llog. It will batch up all of the updates with the same ur_master_index, ur_batchid and sends them in an OUT_UPDATE RPC to each of the remote targets that were part of the operation. At the mean time, all other MDTs will be notified, and they will also check their own local update log, and all of the related records will be sent to the failover MDT. The MDT who receives the updates from other MDT, will check whether the corresponding updates are already recorded in their local update llog. If the updates do not exist in the update llog, they will compare the master transno in the update record with the transno in the last_rcvd, if the transno in update record is smaller than the one in the last_rcvd, it means the master already sent the update to this MDT, and the update is already being exected and committed, and the update log has been deleted, so it will also return an arbitrary smaller transno as above. If the transno in the update record is larger, it will replay the update with a new transno. In all of cases, the MDT will reply to the sender with the transno. The master MDT will check whether the request exists in the update log by the request xid. If it does not exist, it will compare the request transno with its own transno, only replay the request if its transno is bigger than the last transno(lcd_last_transno) of this MDT. If it does exist, it means the recovery between MDTs already handle this case. So it will return an arbitrary smaller transno, then client can remove the request from the replay list. For more details, please refer to the HLD for DNE phase II. |
| Comments |
| Comment by Gerrit Updater [ 05/Jun/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/11737/ |
| Comment by James A Simmons [ 01/Jul/15 ] |
|
Is their work left? |
| Comment by Di Wang [ 01/Jul/15 ] |
|
Oh, this part has been landed. |
| Comment by Di Wang [ 01/Jul/15 ] |
|
patches landed to master |