[LU-7988] HSM: high lock contention for cdt_llog_lock - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.11.0, Lustre 2.10.4, Lustre 2.10.7
Affects Version/s: None
Labels:
- cea

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

There is an important locking issue around cdt_llog_lock when adding new HSM requests.

# time wc -l /proc/fs/lustre/mdt/snx11133-MDT0000/hsm/actions
219759 /proc/fs/lustre/mdt/snx11133-MDT0000/hsm/actions

real    11m45.068s
user    0m0.020s
sys     0m21.372s

11 minutes to cat the list is too high. Such operation should take a couple seconds at most.

The contention appears to come from the coordinator. Every time a new request is posted, the whole list of request is browsed, under that lock. That's not a problem when there is only a handful of request, but it doesn't scale when there is hundreds of thousands of them.

I recompiled a centos 7 kernel with CONFIG_LOCK_STAT on a VM. I ran test creating 10000 files and archiving them without a copytool present. Total time was 146 seconds. Lock contention result:

lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

[...]

                     &cdt->cdt_llog_lock:          6296           6296          15.45       23074.17    43436574.06          17791          27134          25.09       37745.03   138558199.24
                     -------------------
                     &cdt->cdt_llog_lock           6296          [<ffffffffa0fb096d>] cdt_llog_process+0x9d/0x3a0 [mdt]
                     -------------------
                     &cdt->cdt_llog_lock           6296          [<ffffffffa0fb096d>] cdt_llog_process+0x9d/0x3a0 [mdt]
[...]

(time units are micro-seconds).

With waittime-total=43 seconds and holdtime-total=138s, this is a very contentious lock, way above the other locks in Lustre or the whole system.

AFAICS, contention is between these mechanisms:

adding a new request (lfs hsm_archive, ...)
changing a request status (WAITING->STARTED->SUCCEED)
removing a request (archive completed)
housekeeping (coordinator loop every 10 seconds)
dumping the list of actions from /proc

The net result is that when there is a lot of requests, they trickle down to the copytool, exacerbating the problem by increasing the number in the list.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mdt-cdt-250000.svg
141 kB
06/Apr/16 8:20 PM

Issue Links

is related to

LU-9959 hsm: cannot schedule two different requests on the same fid

Open

LU-8626 limit number of items in HSM action queue

Reopened

mentioned in: Page No Confluence page found with the given URL.

Activity

[LU-7988] HSM: high lock contention for cdt_llog_lock

Peter Jones added a comment - 06/Nov/17 1:28 PM

As we finally seem to be at an end of patches queued up for this ticket let's close it and open a new ticket to track any new fixes identified in this area of code in the future.

Peter Jones added a comment - 06/Nov/17 1:28 PM As we finally seem to be at an end of patches queued up for this ticket let's close it and open a new ticket to track any new fixes identified in this area of code in the future.

Gerrit Updater added a comment - 06/Nov/17 3:42 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29742/
Subject: ~~LU-7988~~ hsm: wake up cdt when requests are empty
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 7251fea8dc3c4d29e30c5a3f763c4c33d35f90a7

Gerrit Updater added a comment - 06/Nov/17 3:42 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/29742/ Subject: LU-7988 hsm: wake up cdt when requests are empty Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7251fea8dc3c4d29e30c5a3f763c4c33d35f90a7

Gerrit Updater added a comment - 26/Oct/17 4:08 PM

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28973/
Subject: ~~LU-7988~~ hsm: update many cookie status at once
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 019c48b649d0f668563cf72495d1d1b02f4f69c0

Gerrit Updater added a comment - 26/Oct/17 4:08 PM John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28973/ Subject: LU-7988 hsm: update many cookie status at once Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 019c48b649d0f668563cf72495d1d1b02f4f69c0

Gerrit Updater added a comment - 24/Oct/17 3:38 PM

Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/29742
Subject: ~~LU-7988~~ hsm: wake up cdt when requests are empty
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 0556c7c9bf289ac34cda099819d23ce0063302a2

Gerrit Updater added a comment - 24/Oct/17 3:38 PM Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/29742 Subject: LU-7988 hsm: wake up cdt when requests are empty Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 0556c7c9bf289ac34cda099819d23ce0063302a2

Gerrit Updater added a comment - 12/Oct/17 2:40 PM

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29583
Subject: ~~LU-7988~~ hsm: split mdt_hsm_add_actions()
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 80ce8998119a9bf125c014c94f8fa712a47e0b47

Gerrit Updater added a comment - 12/Oct/17 2:40 PM Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/29583 Subject: LU-7988 hsm: split mdt_hsm_add_actions() Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 80ce8998119a9bf125c014c94f8fa712a47e0b47

Gerrit Updater added a comment - 09/Oct/17 3:49 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/20272/
Subject: ~~LU-7988~~ hsm: split mdt_hsm_add_actions()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 3c0b677cdffae7d329e6f0ab73784b20af2f11f5

Gerrit Updater added a comment - 09/Oct/17 3:49 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/20272/ Subject: LU-7988 hsm: split mdt_hsm_add_actions() Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3c0b677cdffae7d329e6f0ab73784b20af2f11f5

Gerrit Updater added a comment - 13/Sep/17 3:24 PM

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28973
Subject: ~~LU-7988~~ hsm: update many cookie status at once
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 207f81b700adff52bc58685efa8724ed534ae0cf

Gerrit Updater added a comment - 13/Sep/17 3:24 PM Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28973 Subject: LU-7988 hsm: update many cookie status at once Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 207f81b700adff52bc58685efa8724ed534ae0cf

Gerrit Updater added a comment - 13/Sep/17 3:36 AM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/19584/
Subject: ~~LU-7988~~ hsm: update many cookie status at once
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: f3a415289b560b5f422efe2bd08b3b7cff113cf0

Gerrit Updater added a comment - 13/Sep/17 3:36 AM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/19584/ Subject: LU-7988 hsm: update many cookie status at once Project: fs/lustre-release Branch: master Current Patch Set: Commit: f3a415289b560b5f422efe2bd08b3b7cff113cf0

Gerrit Updater added a comment - 08/Sep/17 2:49 PM

Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28908
Subject: ~~LU-7988~~ hsm: added coordinator housekeeping flag
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: 10c2083bc68309b64309e922c27d67c814ce00c5

Gerrit Updater added a comment - 08/Sep/17 2:49 PM Minh Diep (minh.diep@intel.com) uploaded a new patch: https://review.whamcloud.com/28908 Subject: LU-7988 hsm: added coordinator housekeeping flag Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: 10c2083bc68309b64309e922c27d67c814ce00c5

Gerrit Updater added a comment - 31/Aug/17 7:15 PM

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/19582/
Subject: ~~LU-7988~~ hsm: added coordinator housekeeping flag
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: afc9ff6caff7d572041cabf0a957dc8749fce49d

Gerrit Updater added a comment - 31/Aug/17 7:15 PM Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/19582/ Subject: LU-7988 hsm: added coordinator housekeeping flag Project: fs/lustre-release Branch: master Current Patch Set: Commit: afc9ff6caff7d572041cabf0a957dc8749fce49d

Gerrit Updater added a comment - 10/Aug/17 4:26 PM

John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28368/
Subject: ~~LU-7988~~ hsm: run HSM coordinator once per second at most
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: 3223073d68647607d53bb6b4c7447648306e14b6

Gerrit Updater added a comment - 10/Aug/17 4:26 PM John L. Hammond (john.hammond@intel.com) merged in patch https://review.whamcloud.com/28368/ Subject: LU-7988 hsm: run HSM coordinator once per second at most Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: 3223073d68647607d53bb6b4c7447648306e14b6

People

Assignee:: Frank Zago (Inactive)

Reporter:: Frank Zago (Inactive)

Votes:: 2 Vote for this issue

Watchers:: 30 Start watching this issue

Dates

Created:: 05/Apr/16 6:20 PM

Updated:: 21/Jan/19 3:32 PM

Resolved:: 06/Nov/17 1:28 PM