Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16096

recovery: handle compatibility during upgrade for new replay data format

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Fixed
    • Critical
    • Lustre 2.16.0
    • Lustre 2.16.0
    • 9223372036854775807

    Description

      As batched RPC protocol will change the disk format of the client reply data "REPLY_DATA" for recovery, thus we need to handle compatibility during upgrade carefully for this new replay data format.

      The new format is introduced in https://review.whamcloud.com/#/c/46799/.

      The new format is as follow:

      struct lsd_reply_data
      { 
      __u64 lrd_transno; /* transaction number */
      __u64 lrd_xid; /* transmission id */
      __u64 lrd_data; /* per-operation data */
      __u32 lrd_result; /* request result */
      __u32 lrd_client_gen; /* client generation */
      +__u32 lrd_batch_idx; /* sub request index in a batched RPC */
      +__u32 lrd_padding[7]; /* unused fields. */ 
      };
      

      The proposed solution is as follows:

      Add several flags in the magic number field of the reply data header:

      LRH_MAGIC_V1: 0xbdabda01 - the magic number of the old format for client reply data.

      LRH_MAGIC: 0xbdabda02 - the magic number of the new format for the client reply data.

      LRH_FLAG_BACKUP_DONE: 0x00000004 - indicate the target has finished to backup the "REPLY_DATA" with old format.

       

      During the target setup, it will initialize the reply data in @tgt_init()->tgt_reply_data_init().

      1. if found that the "REPLY_DATA" is old format (according to the magic number in the reply data header "LRH_MAGIC"),  the target starts to backup the "REPLY_DATA" file into the file "REPLY_DATA_BAK".
      2. After finished the backup, the target will change the magic number field of the reply data header with LRH_MAGIC_V1 | LRH_FLAG_BACKUP_DONE, and sync the magic flag change into the persistent storage.
      3. The target starts to convert the old format reply data from the backup file "REPLY_DATA_BAK" into the original reply data file "REPLY_DATA".
      4. After finished the conversion, the target changes the magic number @lrh_magic of the reply data header with LRH_MAGIC and @lrh_reply_size with new format, and sync the change to the disk. After that delete the backup file "REPLY_DATA_BAK".
      5. After that, the target starts the recovery. processing as normal with the new format reply data.

       

      Attachments

        Issue Links

          Activity

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: