Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13643

FLR3: Immediate file write mirroring



    • Type: New Feature
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
    • Rank (Obsolete):


      FLR currently implements delayed file write mirroring, where the initial write is done to a single mirror, and an external tool eventually synchronizes the data to the other mirror(s) of the file. If the file is ever modified, then the mirror component(s) other than the one modified is marked stale, and needs to be synchronized by the external tool. This mechanism saves bandwidth from the clients, and still provides data availability unless the file is lost immediately after it is written.

      However, if the file is modified afterward, resyncing the mirrors of the modified component may be cause a large amount of write amplification, or potentially prevent the stale mirrors from being resync'd if it is continuously being modified. It would be preferable to implement immediate file write mirroring, so that the client can submit the same page to multiple RPCs to different OST objects and keep them both updated concurrently.

      Immediate file write (IFW) mirroring may not be desired for all applications, so it should have an LCME_FL_IMMEDIATE flag stored in the component(s) indicating the clients should keep both copies uptodate. Having per-component flags will allow configurations where e.g. two flash mirrors are immediately written, but a third/fourth disk mirror could use delayed resync for emergency recovery and/or cold storage (e.g. if the flash mirrors are only short term hot copies of the file).

      The IFW client will need to notify the MDS whether it can keep the mirrors in sync, otherwise it needs to maintain the current behavior of marking all but one mirror LCME_FL_STALE when the client writes to the file. At the coarsest grain, this means the client needs an OBD_CONNECT2_IMMEDIATE connection-time flag, but it may be able to indicate its intent on a per-file level (e.g. with the initial write intent) so that it can do this on a case-by-case basis (e.g. remote clients should probably not do double writes).


          Issue Links



              wc-triage WC Triage
              adilger Andreas Dilger
              0 Vote for this issue
              6 Start watching this issue