-
Task
-
Resolution: Unresolved
-
Medium
-
None
-
None
-
None
-
3
-
9223372036854775807
The RAFT log may become large, even exceeds the max size of llog catalog, and large RAFT log takes long time to load and replicate. The RAFT protocol employs a periodic snapshot mechanism to control the log size. Since Lustre filesystem snapshot is not always enabled, it can be supported to capture complete state to disk:
- add function to save state to disk, this includes MGS changes, FLDB and Quota
- add function to load snapshot from disk
- add function to send and handle snapshot RPC: the state may be large, and split into multiple RPCs
The snapshot is captured periodically, and in the first step, this can be done upon RAFT node startup.