Details
-
Improvement
-
Resolution: Fixed
-
Critical
-
Lustre 2.16.0
-
9223372036854775807
Description
As batched RPC protocol will change the disk format of the client reply data "REPLY_DATA" for recovery, thus we need to handle compatibility during upgrade carefully for this new replay data format.
The new format is introduced in https://review.whamcloud.com/#/c/46799/.
The new format is as follow:
struct lsd_reply_data { __u64 lrd_transno; /* transaction number */ __u64 lrd_xid; /* transmission id */ __u64 lrd_data; /* per-operation data */ __u32 lrd_result; /* request result */ __u32 lrd_client_gen; /* client generation */ +__u32 lrd_batch_idx; /* sub request index in a batched RPC */ +__u32 lrd_padding[7]; /* unused fields. */ };
The proposed solution is as follows:
Add several flags in the magic number field of the reply data header:
LRH_MAGIC_V1: 0xbdabda01 - the magic number of the old format for client reply data.
LRH_MAGIC: 0xbdabda02 - the magic number of the new format for the client reply data.
LRH_FLAG_BACKUP_DONE: 0x00000004 - indicate the target has finished to backup the "REPLY_DATA" with old format.
During the target setup, it will initialize the reply data in @tgt_init()->tgt_reply_data_init().
- if found that the "REPLY_DATA" is old format (according to the magic number in the reply data header "LRH_MAGIC"), the target starts to backup the "REPLY_DATA" file into the file "REPLY_DATA_BAK".
- After finished the backup, the target will change the magic number field of the reply data header with LRH_MAGIC_V1 | LRH_FLAG_BACKUP_DONE, and sync the magic flag change into the persistent storage.
- The target starts to convert the old format reply data from the backup file "REPLY_DATA_BAK" into the original reply data file "REPLY_DATA".
- After finished the conversion, the target changes the magic number @lrh_magic of the reply data header with LRH_MAGIC and @lrh_reply_size with new format, and sync the change to the disk. After that delete the backup file "REPLY_DATA_BAK".
- After that, the target starts the recovery. processing as normal with the new format reply data.