[LU-10927] Improve efficiency of OSC LRU reclaim - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- patch

Rank (Obsolete):
9223372036854775807

Description

We have seen some performance results which make us believe that the LRU cache slot reclaiming might need some improvement.

In one of the case, when we run a specific kind of application, if we use more memory for Lustre cache (64GB V.S. 8GB), the peformance of the application will be worse, which shouldn't happen if reclaiming of cache works well.

And in some other benchmarks, we found performance are really good when the cache is not full. But when performance becomes full, performance drops immediately. That shouldn't happen either, because the benchmarks only do sequential read, and won't read back.

Those results shows that reclaiming of cache needs some improvement, especially when there are a lot of memory cache. One possible cause of the problem is, when memory becomes bigger, osc_lru_reclaim() needs more time to scan the whole list to free slots. Note that even the caller of osc_lru_reclaim() only need one slot, osc_lru_reclaim() will try to reclaim cl_max_pages_per_rpc slots. And the caller of osc_lru_reclaim() is always the I/O thread, that means the reclaiming a batch of slots will intoruduce overhead directly to the application.And the overhead gets larger when memory gets larger.

Maybe there is a different reason, but we are testing a patch. And I am going to push the patch.

Attachments

Activity

[LU-10927] Improve efficiency of OSC LRU reclaim

Patrick Farrell (Inactive) added a comment - 12/Jun/18 3:02 AM

Is this app single threaded? Or is each thread somehow working on a different OSC...?

It’s hard to see how reducing the batch size could improve performance, even in the single threaded case, though. It will just result in more overhead for each page freed. Do you have any benchmarks showing that it helps?

Also, I agree that hitting the LRU limit has a performance cost, but that’s expected. Freeing pages is not free... Your description of it getting worse with more memory makes me wonder if we’re walking the list in the wrong order, but I don’t see how that could be the case.

I’d be interested in seeing basic perf traces of the application here. This is certainly a case that could be improved, but my experience with it suggests this is not the way to do that.

Patrick Farrell (Inactive) added a comment - 12/Jun/18 3:02 AM Is this app single threaded? Or is each thread somehow working on a different OSC...? It’s hard to see how reducing the batch size could improve performance, even in the single threaded case, though. It will just result in more overhead for each page freed. Do you have any benchmarks showing that it helps? Also, I agree that hitting the LRU limit has a performance cost, but that’s expected. Freeing pages is not free... Your description of it getting worse with more memory makes me wonder if we’re walking the list in the wrong order, but I don’t see how that could be the case. I’d be interested in seeing basic perf traces of the application here. This is certainly a case that could be improved, but my experience with it suggests this is not the way to do that.

Peter Jones added a comment - 19/Apr/18 5:09 PM

Thanks Li Xi

Peter Jones added a comment - 19/Apr/18 5:09 PM Thanks Li Xi

Gerrit Updater added a comment - 19/Apr/18 2:42 AM

Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/32066
Subject: LU-10927 osc: do not spend too much time in osc_lru_reclaim
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 03115d3e0df043204e9b9f6e6fde992fb4cfcaed

Gerrit Updater added a comment - 19/Apr/18 2:42 AM Li Xi (lixi@ddn.com) uploaded a new patch: https://review.whamcloud.com/32066 Subject: LU-10927 osc: do not spend too much time in osc_lru_reclaim Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 03115d3e0df043204e9b9f6e6fde992fb4cfcaed

Improve efficiency of OSC LRU reclaim

Details

Description

Attachments

Activity

People

Dates