Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      Most of the time (unless the filesystem is full), RESTORE and REMOVE requests should be processed first as they have the highest priority from a user's point of view ; ARCHIVE requests should have a lower priority.

      Attachments

        Issue Links

          Activity

            [LU-8324] HSM: prioritize HSM requests

            Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/31723
            Subject: LU-8324 hsm: prioritize one RESTORE once in a while
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b30b036607a2bc4928e13e06462701bf5ba62d3d

            gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/31723 Subject: LU-8324 hsm: prioritize one RESTORE once in a while Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b30b036607a2bc4928e13e06462701bf5ba62d3d

            This thread looks kind of dead, but we have a desire to see some prioritization mechanism as well.
            Some options:
            1. FIFO (today)
            2. Restore-first. All restore requests are prioritized over archive requests. (Except in-progress archives.)
            3. Archive-first. All archives are prioritized.
            4. Interleaved. Archive and Restore requests are alternated, as long as some of each are waiting.
            5. Tunable. Adjustable ratio of archive:restore processing. Maybe this covers the above 2-4 as well.
            6. Batched. Archives and Restores are grouped into separate batches, potentially resulting in fewer tape swaps.
            7. Time-boxed. A variant of batched; batch ends after a fixed time period.
            Many other options I'm sure...

            Ultimately I'm in agreement with Robert Read's comment above that the prioritization should really be done outside of Lustre, but if the patch here implements #5 that might cover enough of the use cases to make most people happy...

            nrutman Nathan Rutman added a comment - This thread looks kind of dead, but we have a desire to see some prioritization mechanism as well. Some options: 1. FIFO (today) 2. Restore-first. All restore requests are prioritized over archive requests. (Except in-progress archives.) 3. Archive-first. All archives are prioritized. 4. Interleaved. Archive and Restore requests are alternated, as long as some of each are waiting. 5. Tunable. Adjustable ratio of archive:restore processing. Maybe this covers the above 2-4 as well. 6. Batched. Archives and Restores are grouped into separate batches, potentially resulting in fewer tape swaps. 7. Time-boxed. A variant of batched; batch ends after a fixed time period. Many other options I'm sure... Ultimately I'm in agreement with Robert Read's comment above that the prioritization should really be done outside of Lustre, but if the patch here implements #5 that might cover enough of the use cases to make most people happy...

            Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27800
            Subject: LU-8324 hsm: ease the development of a different coordinator
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0ee6e5d71c6f549b48be59607d5f55f28e950f47

            gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27800 Subject: LU-8324 hsm: ease the development of a different coordinator Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0ee6e5d71c6f549b48be59607d5f55f28e950f47

            Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27394
            Subject: LU-8324 hsm: prioritize HSM requests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: f8d7f289866c2219d19a51693ae44cc6c3fdf867

            gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27394 Subject: LU-8324 hsm: prioritize HSM requests Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f8d7f289866c2219d19a51693ae44cc6c3fdf867
            pjones Peter Jones added a comment -

            I think that 2.10.1 is more likely option at this stage. It seems like there will be some discussions about this area at LUG next week.

            pjones Peter Jones added a comment - I think that 2.10.1 is more likely option at this stage. It seems like there will be some discussions about this area at LUG next week.
            spitzcor Cory Spitz added a comment -

            If not 2.10, it seems that 2.10.1 would be possible.

            spitzcor Cory Spitz added a comment - If not 2.10, it seems that 2.10.1 would be possible.

            Any chance for this patch to make it into 2.10? It is a very useful feature for HSM users and we believe that the patch is mature.

            hdoreau Henri Doreau (Inactive) added a comment - Any chance for this patch to make it into 2.10? It is a very useful feature for HSM users and we believe that the patch is mature.

            Hello Matt,

            I think the patch is mature enough for you to test it if you are still interested in it.

            bougetq Quentin Bouget (Inactive) added a comment - Hello Matt, I think the patch is mature enough for you to test it if you are still interested in it.

            Hello,
            I would just like to say that we would be very keen on this kind of feature at Cambridge - I've just run into this issue today where a single file restore operation is at the back of the queue behind ~10TB of archive jobs.

            I'd be interested in testing this patch against one of our test filesystems, but I just wanted to add a comment that we would really appreciate having more ability to control the coordinator queue - whether it's in it's current state or some future tool as Robert suggests.

            Kind regards,
            Matt Raso-Barnett
            University of Cambridge

            mrb Matt Rásó-Barnett (Inactive) added a comment - Hello, I would just like to say that we would be very keen on this kind of feature at Cambridge - I've just run into this issue today where a single file restore operation is at the back of the queue behind ~10TB of archive jobs. I'd be interested in testing this patch against one of our test filesystems, but I just wanted to add a comment that we would really appreciate having more ability to control the coordinator queue - whether it's in it's current state or some future tool as Robert suggests. Kind regards, Matt Raso-Barnett University of Cambridge

            Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: http://review.whamcloud.com/21494
            Subject: LU-8324 hsm: prioritize HSM requests
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 0655c8faf7cb7dcdb3b19dd761aad6c06fcda159

            gerrit Gerrit Updater added a comment - Quentin Bouget (quentin.bouget.ocre@cea.fr) uploaded a new patch: http://review.whamcloud.com/21494 Subject: LU-8324 hsm: prioritize HSM requests Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0655c8faf7cb7dcdb3b19dd761aad6c06fcda159

            We would happily consider a more resilient and distributed mecanism for the coordinator. Nevertheless, I see it as a non-trivial project that should not block improvements of HSM, if it targets mid-term future (I have neither seen any design document nor heard any discussion about it).

            The patch has not been pushed yet but the solution that Quentin proposes is leightweight and elegant and I believe that it significantly improves the experience of using HSM in production.
            It is more subjective, but I also find that it improves code quality and makes it easier to reason about the logic of the CDT, which would be helpful for future replacement work.

            hdoreau Henri Doreau (Inactive) added a comment - We would happily consider a more resilient and distributed mecanism for the coordinator. Nevertheless, I see it as a non-trivial project that should not block improvements of HSM, if it targets mid-term future (I have neither seen any design document nor heard any discussion about it). The patch has not been pushed yet but the solution that Quentin proposes is leightweight and elegant and I believe that it significantly improves the experience of using HSM in production. It is more subjective, but I also find that it improves code quality and makes it easier to reason about the logic of the CDT, which would be helpful for future replacement work.

            People

              bougetq Quentin Bouget (Inactive)
              cealustre CEA
              Votes:
              2 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated: