[LU-7988] HSM: high lock contention for cdt_llog_lock Created: 05/Apr/16  Updated: 21/Jan/19  Resolved: 06/Nov/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4, Lustre 2.10.7

Type: Bug Priority: Critical
Reporter: Frank Zago (Inactive) Assignee: Frank Zago (Inactive)
Resolution: Fixed Votes: 2
Labels: cea

Attachments: File mdt-cdt-250000.svg    
Issue Links:
Related
is related to LU-9959 hsm: cannot schedule two different re... Open
is related to LU-8626 limit number of items in HSM action q... Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

There is an important locking issue around cdt_llog_lock when adding new HSM requests.

# time wc -l /proc/fs/lustre/mdt/snx11133-MDT0000/hsm/actions
219759 /proc/fs/lustre/mdt/snx11133-MDT0000/hsm/actions

real    11m45.068s
user    0m0.020s
sys     0m21.372s

11 minutes to cat the list is too high. Such operation should take a couple seconds at most.

The contention appears to come from the coordinator. Every time a new request is posted, the whole list of request is browsed, under that lock. That's not a problem when there is only a handful of request, but it doesn't scale when there is hundreds of thousands of them.

I recompiled a centos 7 kernel with CONFIG_LOCK_STAT on a VM. I ran test creating 10000 files and archiving them without a copytool present. Total time was 146 seconds. Lock contention result:

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

[...]

                     &cdt->cdt_llog_lock:          6296           6296          15.45       23074.17    43436574.06          17791          27134          25.09       37745.03   138558199.24
                     -------------------
                     &cdt->cdt_llog_lock           6296          [<ffffffffa0fb096d>] cdt_llog_process+0x9d/0x3a0 [mdt]
                     -------------------
                     &cdt->cdt_llog_lock           6296          [<ffffffffa0fb096d>] cdt_llog_process+0x9d/0x3a0 [mdt]
[...]

(time units are micro-seconds).

With waittime-total=43 seconds and holdtime-total=138s, this is a very contentious lock, way above the other locks in Lustre or the whole system.

AFAICS, contention is between these mechanisms:

  • adding a new request (lfs hsm_archive, ...)
  • changing a request status (WAITING->STARTED->SUCCEED)
  • removing a request (archive completed)
  • housekeeping (coordinator loop every 10 seconds)
  • dumping the list of actions from /proc

The net result is that when there is a lot of requests, they trickle down to the copytool, exacerbating the problem by increasing the number in the list.



 Comments   
Comment by Gerrit Updater [ 05/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19341
Subject: LU-7988 hsm: execute HSM coordinator once per second at most
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6aa634f1d28e16300dbf428dcec0e38e01b5a2dc

Comment by Frank Zago (Inactive) [ 05/Apr/16 ]

Above patch is only a partial fix. Maybe executing the loop once per second is too much, and should be configurable too.

Comment by Frank Zago (Inactive) [ 06/Apr/16 ]

250000 empty files are created on a filesystem, and hsm_archive is done on them. A copytool is then started to perform these archive requests. The MDT has the patch above.

mdt-cdt-250000.svg shows, the number of records completed per second over the duration of the archival. The max number was 389 at the end. The total duration was 1h:29min:39s. It shows that the processed records is getting faster over time as the backlog queue diminishes.

Comment by Frank Zago (Inactive) [ 06/Apr/16 ]

As a side note, the creation of these 250000 files + hsm_archive by batch of 10 took 3160 seconds with this patch. Without the patch, it takes 11822 seconds.

Comment by Frank Zago (Inactive) [ 06/Apr/16 ]

I think the graph shows that things could be improved. For instance, the MDT shouldn't need to parse the whole backlog of HSM request, as only the first ones (up to /proc/.../hsm/max_requests?) have been seen by a client. The other ones need to be checked regularly possibly for timeout or cancelation, but this is not an operation that needs to be done as frequently.

In my test setup, we should be able to perform constantly at least 200 requests per seconds, as seen during the last minute, for the duration of the archival (1250 seconds total), instead of the average of 46 per seconds (5379 total).

Comment by Gerrit Updater [ 14/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19585
Subject: LU-7988 hsm: defer all new hals
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 756c24f770e569a240d02d9e8b1e94f9ea2ba0a2

Comment by Gerrit Updater [ 14/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19584
Subject: LU-7988 hsm: update many cookie status at once
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 1cec5fa2b8eb264736aa7143418e3df9d5878451

Comment by Gerrit Updater [ 14/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19578
Subject: LU-7988 hsm: remove hsd.request_sz
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 810bfd58bdf238a7f0f83dc5938f51ad218df921

Comment by Gerrit Updater [ 14/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19579
Subject: LU-7988 hsm: Fix possible out of bounds reference in message
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 7bf47d3dc16c7554845aef211bb998d79515c84b

Comment by Gerrit Updater [ 14/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19580
Subject: LU-7988 hsm: improve hsd.request reset
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 10119ceb49ed2724fb529760d4127ac3ff7cc1d6

Comment by Gerrit Updater [ 14/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19581
Subject: LU-7988 hsm: only browse entries of hsd.request when needed
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: d0d0e913f927c9843d0354fbad89ad2bcdf2586e

Comment by Gerrit Updater [ 14/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19582
Subject: LU-7988 hsm: added coordinator housekeeping flag
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 6305a604bc97f563e8a084a70f2e14e62121cef4

Comment by Gerrit Updater [ 14/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19583
Subject: LU-7988 hsm: remove compound_id from mdt_hsm_add_actions proto
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: feec067b250099a1f051b6762bab37db3495a1e5

Comment by Frank Zago (Inactive) [ 14/Apr/16 ]

The above series is trying to fix the issue by deferring everything that can be, so instead of calling cdt_llog_process for everything every time some new work has to be done, requests are queued and the coordinator will process them once per second. This vastly improves performance.

With the patches:

  • creating 250,000 files and hsm_archive them, without a copytool: 1400s (when hsm_archive one by one), or 844s (when hsm_archive by batches of 40).
  • start copytool to archive the files: 876s
  • stop the copytool and hsm_release the files: 948s by batches of 40
  • hsm_restore the files, (without a copytool): 1539s
  • start copytool to restore the files: 1004s

Tested on a VM, with Cray's copytool.

The last patch ("hsm: defer all new hals") is changing the error path and may introduce some different behaviour when error happen during restoration.

Comment by Andreas Dilger [ 14/Apr/16 ]

Frank, I was wondering what you were looking for when submitting this patch series? All of the patches are submitted with Test-Parameters: forbuildonly which means that they will only be built but never tested. I'd have thought possibly the last patch wouldn't have this keyword so that you could get some testing on the patch series. Alternately, you could use Test-Paramters: fortestonly testlist=sanity-hsm to run only the sanity-hsm test on each of the patches?

Also, the patch series was submitted twice, which clogged up the build queue a bit. I killed the remaining builds for the first version once I noticed, and all but the last ones for the second series, since they will never be tested. At least you can verify that the full patch series builds on all the architectures.

Comment by Frank Zago (Inactive) [ 15/Apr/16 ]

Thanks Andreas. I wanted some reviews before starting the testing on these patches, which may clog your test machines. I've added testlist=sanity-hsm to the last one as you suggested.

Comment by Frank Zago (Inactive) [ 18/Apr/16 ]

Dropping http://review.whamcloud.com/19585 as it cannot work efficiently within the current constraints. It's mostly related to restoring files as a lot of checking is done when the request is submitted and the layout locked. If we delay, then a program accessing a release file with see a ENODATA failure since the layout is not locked. This was the reason for sanity-hsm tests 10c and 11b to fail.

Maybe a tinier optimization is to only queue the archive operations, which in the real world, an HSM system would mostly see. Every other operations would act as a barrier.

Comment by Gerrit Updater [ 29/Apr/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/19886
Subject: LU-7988 hsm: defer all new hals
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 96b64c150463c52314689af06222b557f7241631

Comment by Frank Zago (Inactive) [ 29/Apr/16 ]

The more I look at the coordinator code and the more I feel that using HAls internally is the wrong way to go. They should only be used for communication between nodes.

When the MDT receives a message from a client, it makes a HAL out of it, which includes one or more HAI. Eventually most HAI are allocated again in mdt_agent_record_add() as a struct llog_agent_req_rec with the same info, so we might as well start there.

Comment by Frank Zago (Inactive) [ 03/May/16 ]

Current series can perform 237 archive operation per seconds. The setup is a single VM with Lustre+Cray copytool; 0-sized files are created, hsm_archive'd and archived immediately by Cray's copytool.

The bottleneck is still the processing of the requests in the coordinator, and very likely the root issue is the catalog.

Comment by Vinayak (Inactive) [ 09/May/16 ]

Hello Frank, Andreas,

http://review.whamcloud.com/#/c/19578/: Tagged build only
http://review.whamcloud.com/#/c/19579/: Tagged build only
http://review.whamcloud.com/#/c/19580/: Tagged build only
http://review.whamcloud.com/#/c/19581/: Tagged build only
http://review.whamcloud.com/#/c/19582/: Tagged build only
http://review.whamcloud.com/#/c/19583/: Tagged build only
http://review.whamcloud.com/#/c/19584/: Tagged build only
http://review.whamcloud.com/#/c/19585/: abandoned

http://review.whamcloud.com/#/c/19886/: Not containing build only tag

1. The final patch dependent on all the patches mentioned above with Tagged build only except abandoned one. Am I right ?

2. Also please let me know if I can help you in verification process or anything which will boost review process.

3. Is it not a good idea to submit the single patch aggregate of all to avoid series of conflicts ?

Thanks,

Comment by Frank Zago (Inactive) [ 09/May/16 ]

Hi Vinayak,

1. The final patch dependent on all the patches mentioned above with Tagged build only except abandoned one. Am I right ?

Yes.

2. Also please let me know if I can help you in verification process or anything which will boost review process.

As this series changes the way HSM requests are processed, care should be made to ensure the changes make sense, and no undesired behaviour is introduced.

3. Is it not a good idea to submit the single patch aggregate of all to avoid series of conflicts ?

It is much easier to review a series a patches that do separate things that a big one that does many things.

Comment by Vinayak (Inactive) [ 10/May/16 ]

As this series changes the way HSM requests are processed, care should be made to ensure the changes make sense, and no undesired behaviour is introduced.

Okay Frank. I will port the changes to our local branch, create dependency, run the hsm tests and report you.

Comment by Gerrit Updater [ 17/May/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/20272
Subject: LU-7988 hsm: split mdt_hsm_add_actions()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: b37bd734053f281a54f8eee6a3e9a589d64823dd

Comment by Vinayak (Inactive) [ 07/Jun/16 ]

Hello Frank,

Sorry, I am heavily occupied with other work. I did not get time to test the patch set.

Thanks,

Comment by Vinayak (Inactive) [ 24/Jun/16 ]

Hello Frank,

I have tried to reproduce this issue on master(without the patch) using following script by creating 1000 files (I thought 1000 files would take some big time) but it did not take much time for me. Finished in few fraction seconds. btw, I am trying it on single node set up with 1 MDS and 2 OSTs. Please let me know if I am missing anything.

  1. time wc -l /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions
    real 0m0.013s
    user 0m0.000s
    sys 0m0.012s
        numfiles=1000

        echo "Creating $numfiles files"

        for i in $(seq 1 $numfiles); do
                make_small $DIR/f501.sanity-hsm$i
        done

        local fid=$(path2fid $DIR/f501.sanity-hsm$numfiles)

        echo "fid of f501.sanity-hsm$numfiles = $fid" 

        echo "Archiving the files"

        for i in $(seq 1 $numfiles); do
                $LFS hsm_archive $DIR/f501.sanity-hsm$i
        done
      
        sleep 5

        local time=$(time wc -l /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions)
    
        wait_request_state $fid ARCHIVE SUCCEED

Thanks,

Comment by Frank Zago (Inactive) [ 24/Jun/16 ]

Hi Vinayak,

You need a lot more files. Go with at least 100000. I usually test with 250000, and got to 1M files. Let the queue fill up then start the copytool. Then you'll see how slow files are being processed. Use small files (eg 1 byte) so the copytool isn't the choking point.

Here's a script I use:

#!/bin/sh

mkdir /mnt/lustre/foo

start=$(date "+%s")

echo Start $start

for x in {1..6250}; do

        echo loop $x $(($(date "+%s") - start)) seconds since start  $((x * 40 / ($(date "+%s") - start + 1) ))

        mkdir /mnt/lustre/foo/d$x
        cd /mnt/lustre/foo/d$x

        touch f-{1..40}

        lfs hsm_archive f-{1..40}
done

echo Done $start to $(date "+%s")

With that kind of setup, you may run into LU-5228 too.

Comment by Vinayak (Inactive) [ 27/Jun/16 ]

Thanks Frank. I will try that.

Comment by Vinayak (Inactive) [ 29/Jun/16 ]

Hello Frank,

Update:

With the reproducer I could able to experience the delay of 2 mins 33.750 secs with 250,000 files.

  1. time wc -l /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions
    145678 /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions

real 2m33.750s
user 0m0.021s
sys 0m0.795s

Tried the same reproducer with patch1 , patch2 and observed the delay of 56.946 secs which is better value.

  1. time wc -l /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions
    163911 /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions

real 0m56.946s
user 0m0.026s
sys 0m0.936s

Thanks,

Comment by Frank Zago (Inactive) [ 29/Jun/16 ]

Hi Vinayak,
I dropped the patch2 you mentioned above a while ago because of incorrect assumptions I made. The new patch is http://review.whamcloud.com/#/c/19886

Comment by Vinayak (Inactive) [ 30/Jun/16 ]

Hello Frank,

Sorry for the wrong link tagged in my last comment.

Yes, I am using the same patch you indicated there i.e http://review.whamcloud.com/#/c/19886

Just a thought, would it be better to combine these two patches (patch1, patch2) and make it one as these two mainly contribute to the cause and remaining patches can be pushed over this.

copied from your comment:

It is much easier to review a series a patches that do separate things that a big one that does many things.

I agree but merge conflicts cause other patches also to be re-based as they are dependent on each other.

IMO, we can reduce the number of patches, may be to 3.

Thanks,

Comment by Gerrit Updater [ 06/Jul/16 ]

Jean-Baptiste Riaux (riaux.jb@intel.com) uploaded a new patch: http://review.whamcloud.com/21173
Subject: LU-7988 hsm: defer all new archive hals
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: dd9a0e96a71c69f1862494f32f76da65553b9359

Comment by Gerrit Updater [ 06/Jul/16 ]

Jean-Baptiste Riaux (riaux.jb@intel.com) uploaded a new patch: http://review.whamcloud.com/21174
Subject: LU-7988 hsm: defer all new archive hals
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 37d8a8bcf4865198ef1b8ec932f9ea17d5683f43

Comment by Jean-Baptiste Riaux (Inactive) [ 07/Jul/16 ]

Sorry, made a mistake while merging patches. Patches http://review.whamcloud.com/21173 and http://review.whamcloud.com/21174 abandoned.

Comment by Gerrit Updater [ 01/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19578/
Subject: LU-7988 hsm: remove hsd.request_sz from struct hsm_scan_data
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3061a7cc5a3a3e0c7d73d8f5c308635fce69aa66

Comment by Gerrit Updater [ 11/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19579/
Subject: LU-7988 hsm: Fix possible out of bounds reference in message
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 526ae2f6a383b326da4b7be1623d9fb6563d0d37

Comment by Gerrit Updater [ 11/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19581/
Subject: LU-7988 hsm: only browse entries of hsd.request when needed
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 1a10d9219e966daaf61ea9c42610c22126acd836

Comment by Gerrit Updater [ 11/Aug/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/19583/
Subject: LU-7988 hsm: remove compound_id from mdt_hsm_add_actions proto
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 250e826cd3902cd8e06cb7ee5167f2814219f63f

Comment by Frank Zago (Inactive) [ 11/Aug/16 ]

Series rebased: Current patches, in order, are:

http://review.whamcloud.com/19341
http://review.whamcloud.com/19582
http://review.whamcloud.com/19584
http://review.whamcloud.com/20272
http://review.whamcloud.com/19886

Comment by Frank Zago (Inactive) [ 21/Sep/16 ]

Adding a new patch to the series as the remaining issue seems to be around the shutdown of the coordinator, which the new code appears to trigger.

Comment by Gerrit Updater [ 21/Sep/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/22667
Subject: LU-7988 hsm: change coordinator start/stop mechanisms
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 13edea48863f849b19f4fd59e462cea794c1b112

Comment by Bruno Faccini (Inactive) [ 22/Sep/16 ]

Frank, shutdown of CDT is a long time known issue (particularly the fact that CDT stuff/structs cleanup could occur from both MDT umount thread and/or CDT thread itself, and I have missed time to fix it myself) and your last #22667 seems to be a right way to fix it. I only have an additional concern about if cdt_state tests/updates should not need to be protected to fully avoid racy cases ?

Comment by Gerrit Updater [ 23/Sep/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/22704
Subject: LU-7988 hsm: make mdt_hsm_cdt_start a static function
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e271b1a669cc7c19c8de597c3b2f01b0ab05e0bc

Comment by Gerrit Updater [ 27/Sep/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/22761
Subject: LU-7988 hsm: don't use a ptlrpc_thread struct for the cdt
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 622e168f23541917f6cdc3e04e40bd78109c79ab

Comment by Gerrit Updater [ 27/Sep/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/22762
Subject: LU-7988 hsm: mark the cdt as stopped when its thread exits
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: f8ade6d5f931640ea10b1541681e603a0e3c2182

Comment by Gerrit Updater [ 28/Sep/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/22789
Subject: LU-7988 hsm: change the cdt event flags to a simple boolean
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 25b347477bfb2cd09250dd09c82dbe5fde106b1d

Comment by Gerrit Updater [ 29/Sep/16 ]

Frank Zago (fzago@cray.com) uploaded a new patch: http://review.whamcloud.com/22823
Subject: LU-7988 hsm: replace l_wait_event with wait_event_timeout
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 2bbb447b1b6922bfb6b596d9b1c76b6968e6df6f

Comment by Frank Zago (Inactive) [ 04/Oct/16 ]

This series now passes the tests again.

Comment by Gerrit Updater [ 08/Oct/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22704/
Subject: LU-7988 hsm: make mdt_hsm_cdt_start a static function
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 243880c0c6f380591f71b85847f57d45bd441c2d

Comment by Gerrit Updater [ 23/Dec/16 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/22761/
Subject: LU-7988 hsm: don't use a ptlrpc_thread struct for the cdt
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 9811edb26d3bff66c61acc39a72554461445298f

Comment by Cory Spitz [ 24/Apr/17 ]

Some stuff has landed for 2.10, other stuff remains pending. What's the current list of changes that is still pending?

Comment by Gerrit Updater [ 17/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/22667/
Subject: LU-7988 hsm: change coordinator start/stop mechanisms
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 958198e3938120040e93fc2aa255151fa6a81907

Comment by Peter Jones [ 19/May/17 ]

As discussed in the LWG this week, work in this area is ongoing and those ready before code freeze will still land but the prospects of all landing in the remaining time look unlikely so they will end up in a subsequent maintenance release

Comment by Gerrit Updater [ 24/May/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/22789/
Subject: LU-7988 hsm: change the cdt event flags to a simple boolean
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: d185bf805b71079343b42521fafdd734d52d052b

Comment by Gerrit Updater [ 19/Jul/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/22762/
Subject: LU-7988 hsm: mark the cdt as stopped when its thread exits
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f11a5022fc129fec797adb155e5553331f224ecc

Comment by Gerrit Updater [ 26/Jul/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28219
Subject: LU-7988 hsm: mark the cdt as stopped when its thread exits
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 11cfae630ff68cd2f83af6e38223a8256c718661

Comment by Gerrit Updater [ 01/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/19341/
Subject: LU-7988 hsm: run HSM coordinator once per second at most
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: cc6ef11d2f972ebc440013bddda87a536a09750c

Comment by Gerrit Updater [ 03/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28219/
Subject: LU-7988 hsm: mark the cdt as stopped when its thread exits
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 58da5f41d5db9818ae1a1f148331eaad6fc63918

Comment by Gerrit Updater [ 04/Aug/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28368
Subject: LU-7988 hsm: run HSM coordinator once per second at most
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 39cebbd76c55c9673502a4caa3863840d9f4597d

Comment by Gerrit Updater [ 10/Aug/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28368/
Subject: LU-7988 hsm: run HSM coordinator once per second at most
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 3223073d68647607d53bb6b4c7447648306e14b6

Comment by Gerrit Updater [ 31/Aug/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/19582/
Subject: LU-7988 hsm: added coordinator housekeeping flag
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: afc9ff6caff7d572041cabf0a957dc8749fce49d

Comment by Gerrit Updater [ 08/Sep/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28908
Subject: LU-7988 hsm: added coordinator housekeeping flag
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 10c2083bc68309b64309e922c27d67c814ce00c5

Comment by Gerrit Updater [ 13/Sep/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/19584/
Subject: LU-7988 hsm: update many cookie status at once
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f3a415289b560b5f422efe2bd08b3b7cff113cf0

Comment by Gerrit Updater [ 13/Sep/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28973
Subject: LU-7988 hsm: update many cookie status at once
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 207f81b700adff52bc58685efa8724ed534ae0cf

Comment by Gerrit Updater [ 09/Oct/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/20272/
Subject: LU-7988 hsm: split mdt_hsm_add_actions()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3c0b677cdffae7d329e6f0ab73784b20af2f11f5

Comment by Gerrit Updater [ 12/Oct/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29583
Subject: LU-7988 hsm: split mdt_hsm_add_actions()
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 80ce8998119a9bf125c014c94f8fa712a47e0b47

Comment by Gerrit Updater [ 24/Oct/17 ]

Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/29742
Subject: LU-7988 hsm: wake up cdt when requests are empty
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0556c7c9bf289ac34cda099819d23ce0063302a2

Comment by Gerrit Updater [ 26/Oct/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28973/
Subject: LU-7988 hsm: update many cookie status at once
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 019c48b649d0f668563cf72495d1d1b02f4f69c0

Comment by Gerrit Updater [ 06/Nov/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29742/
Subject: LU-7988 hsm: wake up cdt when requests are empty
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7251fea8dc3c4d29e30c5a3f763c4c33d35f90a7

Comment by Peter Jones [ 06/Nov/17 ]

As we finally seem to be at an end of patches queued up for this ticket let's close it and open a new ticket to track any new fixes identified in this area of code in the future.

Comment by Gerrit Updater [ 14/Dec/17 ]

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/30538
Subject: LU-7988 hsm: wake up cdt when requests are empty
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 7121b1d335453f31766f2f59909d20b7cde3e948

Comment by Gerrit Updater [ 19/Dec/17 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/30538/
Subject: LU-7988 hsm: wake up cdt when requests are empty
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: d5a7563a46acf15252929b6a045eef8c58022314

Comment by Gerrit Updater [ 26/Feb/18 ]

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/29583/
Subject: LU-7988 hsm: split mdt_hsm_add_actions()
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 03fc1a015f5869b68d21eee69531a201045532e1

Comment by Aurelien Degremont (Inactive) [ 14/Dec/18 ]

It seems this ticket was wrongly flagged to 2.10.6.

The last patch being landed on b2_10 from this series was landed for 2.10.4

There is still one backport which was not landed and is interesting: https://review.whamcloud.com/#/c/28908/2

Comment by Gerrit Updater [ 19/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/28908/
Subject: LU-7988 hsm: added coordinator housekeeping flag
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 387dca70723affc0b45b87730e994395b9627eef

Generated at Sat Feb 10 02:13:39 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.