[LU-8324] HSM: prioritize HSM requests - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Most of the time (unless the filesystem is full), RESTORE and REMOVE requests should be processed first as they have the highest priority from a user's point of view ; ARCHIVE requests should have a lower priority.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

analyzer-lu-8324.sh
3 kB
22/Mar/18 2:25 PM

Issue Links

is duplicated by

LU-14363 Prioritize HSM cancel request

Open

is related to

LU-8382 HSM: reorder coordinator's cleanup functions

Resolved

is related to

LU-10968 add coordinator bypass upcalls for HSM archive and remove

Reopened

mentioned in: Page No Confluence page found with the given URL.

Activity

[LU-8324] HSM: prioritize HSM requests

Gerrit Updater added a comment - 26/Sep/18 12:07 PM

Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/33239
Subject: LU-8324 hsm: prioritize one RESTORE once in a while
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 93364e9f3b0c9694904d2c1e2a687af61a980c1f

Gerrit Updater added a comment - 26/Sep/18 12:07 PM Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/33239 Subject: LU-8324 hsm: prioritize one RESTORE once in a while Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 93364e9f3b0c9694904d2c1e2a687af61a980c1f

Quentin Bouget (Inactive) added a comment - 22/Mar/18 2:30 PM

The patch above is the shortest/simplest hack I could come up with to help bear with LU-8324 until a more definitive fix is developed (it is more of a band-aid than anything else).

The idea is to use the times when the coordinator traverses its whole llog to "force-schedule" at least one RESTORE request. In practice, this means that you should see at least one RESTORE request scheduled every "loop_period" (the value in /proc/<fsname>/mdt/<mdt-name>/hsm/loop_period) seconds.

Quentin Bouget (Inactive) added a comment - 22/Mar/18 2:30 PM The patch above is the shortest/simplest hack I could come up with to help bear with LU-8324 until a more definitive fix is developed (it is more of a band-aid than anything else). The idea is to use the times when the coordinator traverses its whole llog to "force-schedule" at least one RESTORE request. In practice, this means that you should see at least one RESTORE request scheduled every "loop_period" (the value in /proc/<fsname>/mdt/<mdt-name>/hsm/loop_period ) seconds.

Gerrit Updater added a comment - 22/Mar/18 10:41 AM

Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/31723
Subject: LU-8324 hsm: prioritize one RESTORE once in a while
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b30b036607a2bc4928e13e06462701bf5ba62d3d

Gerrit Updater added a comment - 22/Mar/18 10:41 AM Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/31723 Subject: LU-8324 hsm: prioritize one RESTORE once in a while Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b30b036607a2bc4928e13e06462701bf5ba62d3d

Nathan Rutman added a comment - 23/Feb/18 9:13 PM

This thread looks kind of dead, but we have a desire to see some prioritization mechanism as well.
Some options:
1. FIFO (today)
2. Restore-first. All restore requests are prioritized over archive requests. (Except in-progress archives.)
3. Archive-first. All archives are prioritized.
4. Interleaved. Archive and Restore requests are alternated, as long as some of each are waiting.
5. Tunable. Adjustable ratio of archive:restore processing. Maybe this covers the above 2-4 as well.
6. Batched. Archives and Restores are grouped into separate batches, potentially resulting in fewer tape swaps.
7. Time-boxed. A variant of batched; batch ends after a fixed time period.
Many other options I'm sure...

Ultimately I'm in agreement with Robert Read's comment above that the prioritization should really be done outside of Lustre, but if the patch here implements #5 that might cover enough of the use cases to make most people happy...

Nathan Rutman added a comment - 23/Feb/18 9:13 PM This thread looks kind of dead, but we have a desire to see some prioritization mechanism as well. Some options: 1. FIFO (today) 2. Restore-first. All restore requests are prioritized over archive requests. (Except in-progress archives.) 3. Archive-first. All archives are prioritized. 4. Interleaved. Archive and Restore requests are alternated, as long as some of each are waiting. 5. Tunable. Adjustable ratio of archive:restore processing. Maybe this covers the above 2-4 as well. 6. Batched. Archives and Restores are grouped into separate batches, potentially resulting in fewer tape swaps. 7. Time-boxed. A variant of batched; batch ends after a fixed time period. Many other options I'm sure... Ultimately I'm in agreement with Robert Read's comment above that the prioritization should really be done outside of Lustre, but if the patch here implements #5 that might cover enough of the use cases to make most people happy...

Gerrit Updater added a comment - 23/Jun/17 11:37 AM

Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27800
Subject: LU-8324 hsm: ease the development of a different coordinator
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0ee6e5d71c6f549b48be59607d5f55f28e950f47

Gerrit Updater added a comment - 23/Jun/17 11:37 AM Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27800 Subject: LU-8324 hsm: ease the development of a different coordinator Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0ee6e5d71c6f549b48be59607d5f55f28e950f47

Gerrit Updater added a comment - 02/Jun/17 1:05 PM

Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27394
Subject: LU-8324 hsm: prioritize HSM requests
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f8d7f289866c2219d19a51693ae44cc6c3fdf867

Gerrit Updater added a comment - 02/Jun/17 1:05 PM Quentin Bouget (quentin.bouget@cea.fr) uploaded a new patch: https://review.whamcloud.com/27394 Subject: LU-8324 hsm: prioritize HSM requests Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: f8d7f289866c2219d19a51693ae44cc6c3fdf867

Peter Jones added a comment - 23/May/17 6:54 PM

I think that 2.10.1 is more likely option at this stage. It seems like there will be some discussions about this area at LUG next week.

Peter Jones added a comment - 23/May/17 6:54 PM I think that 2.10.1 is more likely option at this stage. It seems like there will be some discussions about this area at LUG next week.

Cory Spitz added a comment - 18/May/17 7:09 PM

If not 2.10, it seems that 2.10.1 would be possible.

Cory Spitz added a comment - 18/May/17 7:09 PM If not 2.10, it seems that 2.10.1 would be possible.

Henri Doreau (Inactive) added a comment - 12/Apr/17 1:29 PM

Any chance for this patch to make it into 2.10? It is a very useful feature for HSM users and we believe that the patch is mature.

Henri Doreau (Inactive) added a comment - 12/Apr/17 1:29 PM Any chance for this patch to make it into 2.10? It is a very useful feature for HSM users and we believe that the patch is mature.

Quentin Bouget (Inactive) added a comment - 20/Feb/17 12:45 PM

Hello Matt,

I think the patch is mature enough for you to test it if you are still interested in it.

Quentin Bouget (Inactive) added a comment - 20/Feb/17 12:45 PM Hello Matt, I think the patch is mature enough for you to test it if you are still interested in it.

Matt Rásó-Barnett (Inactive) added a comment - 09/Jan/17 4:41 PM

Hello,
I would just like to say that we would be very keen on this kind of feature at Cambridge - I've just run into this issue today where a single file restore operation is at the back of the queue behind ~10TB of archive jobs.

I'd be interested in testing this patch against one of our test filesystems, but I just wanted to add a comment that we would really appreciate having more ability to control the coordinator queue - whether it's in it's current state or some future tool as Robert suggests.

Kind regards,
Matt Raso-Barnett
University of Cambridge

Matt Rásó-Barnett (Inactive) added a comment - 09/Jan/17 4:41 PM Hello, I would just like to say that we would be very keen on this kind of feature at Cambridge - I've just run into this issue today where a single file restore operation is at the back of the queue behind ~10TB of archive jobs. I'd be interested in testing this patch against one of our test filesystems, but I just wanted to add a comment that we would really appreciate having more ability to control the coordinator queue - whether it's in it's current state or some future tool as Robert suggests. Kind regards, Matt Raso-Barnett University of Cambridge

People

Assignee:: Quentin Bouget (Inactive)

Reporter:: CEA

Votes:: 2 Vote for this issue

Watchers:: 17 Start watching this issue

Dates

Created:: 24/Jun/16 12:30 PM

Updated:: 28/Feb/25 1:03 AM