[LU-3018] small reads occur during bulk writes hurting overall performance Created: 22/Mar/13 Updated: 09/May/14 Resolved: 09/May/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Andrew Perepechko | Assignee: | Cliff White (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | patch | ||
| Severity: | 3 |
| Rank (Obsolete): | 7341 |
| Description |
|
During obdfilter-survey on OSS node or IOR write test from the client there are small reads during bulk writes which impact the performance of the OST. A patch will be uploaded shortly. |
| Comments |
| Comment by Andrew Perepechko [ 22/Mar/13 ] |
| Comment by Andrew Perepechko [ 22/Mar/13 ] |
|
The logs showing how bitmap pages are evicted: [<ffffffff81135c28>] ? __remove_mapping+0xd8/0x160 [<ffffffff81136b7d>] ? shrink_page_list.clone.0+0x47d/0x5e0 [<ffffffff81136fd0>] ? shrink_inactive_list+0x2f0/0x730 [<ffffffffa04e90fd>] ? cfs_hash_rw_unlock+0x1d/0x30 [libcfs] [<ffffffffa04e7ac4>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] [<ffffffff8113821f>] ? shrink_zone+0x38f/0x510 [<ffffffff8109cc99>] ? ktime_get_ts+0xa9/0xe0 [<ffffffff8113849e>] ? do_try_to_free_pages+0xfe/0x520 [<ffffffff81138abf>] ? try_to_free_pages+0x9f/0x130 [<ffffffff81139c40>] ? isolate_pages_global+0x0/0x380 [<ffffffff811301a7>] ? __alloc_pages_nodemask+0x447/0x920 [<ffffffff81164e2a>] ? alloc_pages_current+0xaa/0x110 [<ffffffff8111ccf7>] ? __page_cache_alloc+0x87/0x90 [<ffffffff8111db0f>] ? find_or_create_page+0x4f/0xb0 [<ffffffff81135c28>] ? __remove_mapping+0xd8/0x160 [<ffffffff81135cc6>] ? remove_mapping+0x16/0x30 [<ffffffff81134bf2>] ? invalidate_inode_page+0x82/0xb0 [<ffffffff81134efa>] ? invalidate_mapping_pages+0xda/0x150 [<ffffffff814fb8eb>] ? _spin_unlock+0x2b/0x40 [<ffffffff811a2ec0>] ? shrink_icache_memory+0x1c0/0x2e0 [<ffffffff811a2f95>] ? shrink_icache_memory+0x295/0x2e0 [<ffffffff811362ed>] ? shrink_slab+0x14d/0x1b0 [<ffffffff8113963d>] ? balance_pgdat+0x5ad/0x810 [<ffffffff81139c40>] ? isolate_pages_global+0x0/0x380 [<ffffffff811399e4>] ? kswapd+0x144/0x3a0 [<ffffffff81139deb>] isolate_pages_global+0x1ab/0x380 [<ffffffff81136d99>] ? shrink_inactive_list+0xb9/0x730 [<ffffffff81136e42>] shrink_inactive_list+0x162/0x730 [<ffffffffa04e90fd>] ? cfs_hash_rw_unlock+0x1d/0x30 [libcfs] [<ffffffffa04e7ac4>] ? cfs_hash_dual_bd_unlock+0x34/0x60 [libcfs] [<ffffffffa04e9c12>] ? cfs_hash_lookup+0x82/0xa0 [libcfs] [<ffffffffa06a20f5>] ? cl_env_fetch+0x25/0x80 [obdclass] [<ffffffff8113821f>] shrink_zone+0x38f/0x510 [<ffffffff811397a9>] balance_pgdat+0x719/0x810 [<ffffffff81139c40>] ? isolate_pages_global+0x0/0x380 [<ffffffff811399e4>] kswapd+0x144/0x3a0 Note they are not passing through shrink_active_list(); I_NEW is needed to avoid the following code path: [<ffffffff81134efa>] ? invalidate_mapping_pages+0xda/0x150 [<ffffffff814fb8eb>] ? _spin_unlock+0x2b/0x40 [<ffffffff811a2ec0>] ? shrink_icache_memory+0x1c0/0x2e0 |
| Comment by Andrew Perepechko [ 22/Mar/13 ] |
|
Xyratex-bug-id: MRP-691 |
| Comment by James A Simmons [ 22/Mar/13 ] |
|
Would you mind if I update the patch to support SLES11 SP2 as well? |
| Comment by Andrew Perepechko [ 22/Mar/13 ] |
|
Hello James! |
| Comment by Andrew Perepechko [ 23/Mar/13 ] |
|
Using mark_page_accessed() is not enough to avoid page eviction. find_or_create_page() allocates a page and links it to the corresponding cpu buffer. struct page *find_or_create_page(struct address_space *mapping,
pgoff_t index, gfp_t gfp_mask)
{
struct page *page;
int err;
repeat:
page = find_lock_page(mapping, index);
if (!page) {
page = __page_cache_alloc(gfp_mask);
if (!page)
return NULL;
/*
* We want a regular kernel memory (not highmem or DMA etc)
* allocation for the radix tree nodes, but we need to honour
* the context-specific requirements the caller has asked for.
* GFP_RECLAIM_MASK collects those requirements.
*/
err = add_to_page_cache_lru(page, mapping, index,
(gfp_mask & GFP_RECLAIM_MASK));
...
}
void __lru_cache_add(struct page *page, enum lru_list lru) { struct pagevec *pvec = &get_cpu_var(lru_add_pvecs)[lru]; page_cache_get(page); if (!pagevec_add(pvec, page)) ____pagevec_lru_add(pvec, lru); put_cpu_var(lru_add_pvecs); } Note that ____pagevec_lru_add() which calls SetPageLRU() is only performed when the cpu buffer is full. make_page_accessed() activates the page only if it is on LRU. Otherwise, the page is marked or kept referenced: void mark_page_accessed(struct page *page)
{
if (!PageActive(page) && !PageUnevictable(page) &&
PageReferenced(page) && PageLRU(page)) {
activate_page(page);
ClearPageReferenced(page);
} else if (!PageReferenced(page)) {
SetPageReferenced(page);
}
}
shrink_inactive_list() drains the buffer and evicts the pages even if mark_page_accessed() was called a lot of times: static unsigned long shrink_inactive_list(unsigned long max_scan, struct zone *zone, struct scan_control *sc, int priority, int file) { LIST_HEAD(page_list); struct pagevec pvec; unsigned long nr_scanned = 0; unsigned long nr_reclaimed = 0; unsigned long nr_dirty = 0; unsigned long nr_writeback = 0; struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); while (unlikely(too_many_isolated(zone, file, sc))) { congestion_wait(BLK_RW_ASYNC, HZ/10); /* We are about to die and free our memory. Return now. */ if (fatal_signal_pending(current)) return SWAP_CLUSTER_MAX; } pagevec_init(&pvec, 1); lru_add_drain(); ... } |
| Comment by Keith Mannthey (Inactive) [ 25/Mar/13 ] |
|
Can you please post any detailed performance data you have? What environment are you testing in and what results do you see? |
| Comment by Keith Mannthey (Inactive) [ 21/May/13 ] |
|
Andrew Perepechko, Any update? |
| Comment by Andrew Perepechko [ 22/May/13 ] |
|
Keith Mannthey, there is a lot ongoing activity in LKML. |
| Comment by Keith Mannthey (Inactive) [ 22/May/13 ] |
|
That is excellent news. |
| Comment by Andrew Perepechko [ 27/Sep/13 ] |
|
this ticket should be closed and the long-term solution backported from the vanilla kernel |
| Comment by Cliff White (Inactive) [ 09/May/14 ] |
|
Closing ticket per Andrew |