Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-13637

Combining RO-PCC/RW-PCC/HSM with FLR HLD

Details

    • New Feature
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      This ticket track the HLD of combining RO-PCC/RW-PCC/HSM with FLR.

      Attachments

        Issue Links

          Activity

            [LU-13637] Combining RO-PCC/RW-PCC/HSM with FLR HLD

            I don't have perms for the wiki page, so I'll comment here.
            1. Should the hsm_states flags and the lov_comp_md_entry_flags could/should be normalized?

            • HS_DIRTY=LCME_FL_STALE,
            • HS_EXISTS=LCME_FL_INIT,
            • HS_NOARCHIVE=LCME_FL_NOSYNC
            • HS_LOST=LCME_FL_OFFLINE
            • Remaining HSM flags could just be added to the lcme_flags.

            Why: reduce redundancy; simplify status gathering for a replica; provide common state for any replica type; HS_DIRTY and LCME_FL_STALE need to remain in sync for any HSM replicas.

            2. HS_RELEASED on an HSM replica seems like it needs to imply LCME_FL_STALE on all non-HSM replicas. In fact, the whole concept of "released" is fuzzy; it's a collective state of non-HSM replicas, and maybe is not a very useful concept anymore. It seems that per-replica flags should just indicate if an individual replica exists (so we can free up its space) and is/isn't stale (so we know to re-sync). An "HSM release" command should not exist - it should instead be "lfs mirror delete" for whichever replica you want to remove. "Released" is just the state of not having any OST-based replicas.

            3. For an HS_PCC_RW file, LCME_FL_STALE need only be set if the data is changed (version is updated); otherwise the implication is that all data has to be copied back. I suppose you can check at Restore time if the version has changed and skip the copyback if not and clear the stale flag (don't want to increase l.gen in this case).

            4. "After taking the lease, an empty mirror file is created on the local PCC-RW cache. The file data is then copied from the OST into PCC-RW and the client closes the lease with a layout write with PCC-RW attach intent. ..." This idea seems like the old "lfs migrate" path: create a new file, copy data to it, then attach/extend the new file to the old layout, checking layout generations, etc. I think it should actually go the other way: extend the layout adding a new hsm replica first. Mark it stale. Now copytool on client just needs to sync that stale hsm replica, as normal HSM op. It will start out HS_PCC_RO, or really just HS_ARCHIVED and not HS_DIRTY. Maybe don't even need the HS_PCC_RO state. Now say we want to write to the PCC file, then we call an IOCTL setting the PCC replica as LCME_FL_PREF_RW, and set LCME_FL_STALE on the other replicas (don't need HS_PCC_RW either). Lustre read from another client would follow the Restore path (lfs mirror resync strategy), after which the OST replica is marked preferred.

            I think the changes I have suggested above start to treat "external" files as first-class Lustre objects and puts them on an even footing (as far as the MDS is concerned) with files stored on OSTs. A layout on OSTs or on HSMs or on PCC may be authoritative/primary, and we use copytools as needed to do the resync in the required direction.

            nrutman Nathan Rutman added a comment - I don't have perms for the wiki page, so I'll comment here. 1. Should the hsm_states flags and the lov_comp_md_entry_flags could/should be normalized? HS_DIRTY=LCME_FL_STALE, HS_EXISTS=LCME_FL_INIT, HS_NOARCHIVE=LCME_FL_NOSYNC HS_LOST=LCME_FL_OFFLINE Remaining HSM flags could just be added to the lcme_flags. Why: reduce redundancy; simplify status gathering for a replica; provide common state for any replica type; HS_DIRTY and LCME_FL_STALE need to remain in sync for any HSM replicas. 2. HS_RELEASED on an HSM replica seems like it needs to imply LCME_FL_STALE on all non-HSM replicas. In fact, the whole concept of "released" is fuzzy; it's a collective state of non-HSM replicas, and maybe is not a very useful concept anymore. It seems that per-replica flags should just indicate if an individual replica exists (so we can free up its space) and is/isn't stale (so we know to re-sync). An "HSM release" command should not exist - it should instead be "lfs mirror delete" for whichever replica you want to remove. "Released" is just the state of not having any OST-based replicas. 3. For an HS_PCC_RW file, LCME_FL_STALE need only be set if the data is changed (version is updated); otherwise the implication is that all data has to be copied back. I suppose you can check at Restore time if the version has changed and skip the copyback if not and clear the stale flag (don't want to increase l.gen in this case). 4. "After taking the lease, an empty mirror file is created on the local PCC-RW cache. The file data is then copied from the OST into PCC-RW and the client closes the lease with a layout write with PCC-RW attach intent. ..." This idea seems like the old "lfs migrate" path: create a new file, copy data to it, then attach/extend the new file to the old layout, checking layout generations, etc. I think it should actually go the other way: extend the layout adding a new hsm replica first. Mark it stale. Now copytool on client just needs to sync that stale hsm replica, as normal HSM op. It will start out HS_PCC_RO, or really just HS_ARCHIVED and not HS_DIRTY. Maybe don't even need the HS_PCC_RO state. Now say we want to write to the PCC file, then we call an IOCTL setting the PCC replica as LCME_FL_PREF_RW, and set LCME_FL_STALE on the other replicas (don't need HS_PCC_RW either). Lustre read from another client would follow the Restore path (lfs mirror resync strategy), after which the OST replica is marked preferred. I think the changes I have suggested above start to treat "external" files as first-class Lustre objects and puts them on an even footing (as far as the MDS is concerned) with files stored on OSTs. A layout on OSTs or on HSMs or on PCC may be authoritative/primary, and we use copytools as needed to do the resync in the required direction.
            qian_wc Qian Yingjin added a comment -

            Hi Andreas,

            I have create a wiki page for review: https://wiki.whamcloud.com/pages/viewpage.action?pageId=140382838

             

            Thanks,

            Qian

            qian_wc Qian Yingjin added a comment - Hi Andreas, I have create a wiki page for review: https://wiki.whamcloud.com/pages/viewpage.action?pageId=140382838   Thanks, Qian

            Yingjin, I updated the doc with comments. It might make more sense to use a wiki page for this, since it is easier to edit and comment on a shared document.

            adilger Andreas Dilger added a comment - Yingjin, I updated the doc with comments. It might make more sense to use a wiki page for this, since it is easier to edit and comment on a shared document.

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: