[LU-14541] Memory reclaim caused a stale data read Created: 22/Mar/21 Updated: 20/May/23 Resolved: 20/May/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.15.0, Lustre 2.15.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexey Lyashkov | Assignee: | Patrick Farrell |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Severity: | 3 | ||||||||||||||||||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||||||||||||||||||
| Description |
Call the filesystem back whenever a page is removed from the page cache so one more vmpage reference highly likely caused fail. |
| Comments |
| Comment by Wang Shilong (Inactive) [ 28/Apr/21 ] |
|
I think problem existed, but it might be different a bit like commit: ll_releasepage() we will
if (!cl_page_in_use(page)) {
result = 1;
cl_page_delete(env, page);
}
And at this time, cl_page reference count is 2, and cl_page_delete() called to move CPS_FREEDING, and cl_page reference will be dropped to 1. At the last we have cl_page_put(env, page); with this, cl_page and osc page will be dropped from memory.
But as comments saying, it is still possible that vmpage could not be released because of (page refcount != 2), vmpage refcount is a bit tricky here. I think potentially we might fix this issue by clear page update flag in vvp_page_delete()? diff --git a/lustre/llite/vvp_page.c b/lustre/llite/vvp_page.c index d0e274c..87ec2a8 100644 --- a/lustre/llite/vvp_page.c +++ b/lustre/llite/vvp_page.c @@ -178,6 +178,12 @@ static void vvp_page_delete(const struct lu_env *env, ClearPagePrivate(vmpage); vmpage->private = 0; + + /** + * Vmpage might not be released due page refcount != 2, + * clear Page uptodate here to avoid stale data. + */ + ClearPageUptodate(vmpage); /* * Reference from vmpage to cl_page is removed, but the reference back
|
| Comment by Wang Shilong (Inactive) [ 28/Apr/21 ] |
|
Wang Shilong (wshilong@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/43476 |
| Comment by Alexey Lyashkov [ 28/Apr/21 ] |
|
This patch is tested already - it solve a problem, but some problems exist. I have a different patch with change a cl_page states change to avoid own a CPS_FREED pages, but no resources to verify it. in our case, it reproduced with overstripe with 5000 stripes and sysctl -w vm.drop_caches=3 on client nodes in parallel to the IOR. |
| Comment by Andreas Dilger [ 28/Apr/21 ] |
|
Shadow, can you please clarify your comment. It sounds like you made a similar patch, but it isn't clear if you are currently using that patch or if it had so many problems that it wasn't applied? It sounds like there are some theoretical problems but they may not actually be seen in real use? Do you have any idea why there are extra references on the page? Is that because of a bug in llite or is some other kernel process actually holding a reference on the page at the time we are trying to drop it (e.g. kswapd) and it never gets cleaned up afterward? |
| Comment by Alexey Lyashkov [ 28/Apr/21 ] |
|
Andreas, i was cook a similar patch while debug this issue. But during internal discussion - Panda point to the place in kernel where EIO can returned in case page is in mapping and it not uptodate. This is reason to decline this way. In additional to the extra memory usage after this patch applied. As about extra reference, highly likely this is clean kswapd vs lock cancel race and it looks osc_gang_lookup take an extra ref while do own work. |
| Comment by Andreas Dilger [ 28/Apr/21 ] |
|
I guess the main question is how often the EIO case can happen (e.g. only in rare cases, or commonly, and only if the file is being modified, or even for files that are not changed), and whether it is better to return EIO (safe, but annoying) or stale data (unsafe, but usually correct?) to the application? |
| Comment by Andreas Dilger [ 03/Jun/21 ] |
It looks like cl_page_delete() will prevent new references to find a page, but doesn't handle old references. What about adding a refcount check in osc_gang_lookup() that detects if it is the last refcount holder and dropping the page from the mapping there? |
| Comment by Gerrit Updater [ 22/Jul/21 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/43476/ |
| Comment by Peter Jones [ 22/Jul/21 ] |
|
Landed for 2.15 |
| Comment by Cory Spitz [ 23/Jul/21 ] |
|
Are there any performance test results with this patch? |
| Comment by Patrick Farrell [ 23/Jul/21 ] |
|
No, but it would be very surprising if it had any performance implications - the direct effect is a single bit operation on page flags, and then making a page which is being destroyed inaccessible slightly earlier. (Which is correct because the data in it was not up to date.) |
| Comment by Cory Spitz [ 23/Jul/21 ] |
|
Thanks, Patrick. It is funny because we have been dragging our feet since March on this because of the potential perf impact. We're in the process of gathering numbers now and will post them when available. |
| Comment by Patrick Farrell [ 23/Jul/21 ] |
|
Why do folks think there would be a perf impact? This is a page which is being deleted. |
| Comment by Andreas Dilger [ 25/Jul/21 ] |
|
Cory, could you please clarify your comment about potential performance impact? Shadow had some concerns about the provided patch not fixing all of the corner cases, but no patch was forthcoming from him. The landed patch fixes a regular crash in the regression tests, and if there is an additional patch that addresses other issues it can be applied on top of the one that was landed. |
| Comment by Cory Spitz [ 02/Aug/21 ] |
|
adilger, sorry, I was out on PTO and wasn't able to clarify sooner. Yes, we had some performance concerns because clearly there could be a buffer cache impact with a change such as this. Fortunately, we've done some performance testing of the fix and the long and short of it is that the performance is fine. We can share more if necessary. As for the EIO potential, that investigation and testing is still on-going. So far, so good. We can report back on that topic shortly. |
| Comment by Petros Koutoupis [ 02/Aug/21 ] |
|
I uploaded the performance results ( |
| Comment by Patrick Farrell [ 02/Aug/21 ] |
|
Still really don't understand why it was thought this might have a performance impact; perhaps it was another approach to fixing this? This one is just making it so pages which are currently being deleted cannot be read. |
| Comment by Alexey Lyashkov [ 03/Aug/21 ] |
|
@Patrik, any change should be tested with performance impact. as example not visible issue: https://review.whamcloud.com/#/c/35610/ - will cause a random read, where a sequentially was before. It because pages covered lock can be aged and removed at any order, but lock cancel have a guarantee continues area will freed. For this particular case, different approach was exist. But it looks have issues. and have a cost -5% for IOPs. So this patch should be the tested for correctness - LTP, IOR .. and checked by performance impact. |
| Comment by Cory Spitz [ 05/Aug/21 ] |
|
Additional context:
But as koutoupis wrote when we run on the fix on the benchmarking platform we see no regressions. |
| Comment by Cory Spitz [ 05/Aug/21 ] |
|
> 2) page without uptodate flag may cause a EIO in some cases, a specially with splice. Don't sure - but possible.
We've completed runs of an LTP-heavy mix and found no issues with EIO. |
| Comment by Gerrit Updater [ 14/Dec/21 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/45848 |
| Comment by John Hammond [ 04/May/22 ] |
|
Attached a reproducer rw_seq_cst_vs_drop_caches.c. Running with fix reverted: # lctl set_param llite.*.fast_read=1
llite.lustre-ffff9d20ca304088.fast_read=1
llite.lustre-ffff9d21cc919028.fast_read=1
# ./rw_seq_cst_vs_drop_caches /mnt/lustre/f0 /mnt/lustre2/f0
u = 2, v = { 2, 1 }
Aborted (core dumped)
# lctl set_param llite.*.fast_read=0
llite.lustre-ffff9d20ca304088.fast_read=0
llite.lustre-ffff9d21cc919028.fast_read=0
# ./rw_seq_cst_vs_drop_caches /mnt/lustre/f0 /mnt/lustre2/f0
u = 2, v = { 2, 1 }
Aborted (core dumped)
|
| Comment by Gerrit Updater [ 06/May/22 ] |
|
"John L. Hammond <jhammond@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47243 |
| Comment by Patrick Farrell [ 09/May/22 ] |
|
Looking at the code for readpage in filemap_fault(), we can see clearly that it's not OK to unset pageuptodate at an arbitrary time. The page comes back unlocked from readpage, and the code requires it be uptodate or it will give an error (resulting in the SIGBUS we see with But if we leave the page uptodate, we hit this issue where we can see stale data. |
| Comment by Gerrit Updater [ 09/May/22 ] |
|
"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47262 |
| Comment by Gerrit Updater [ 09/May/22 ] |
|
"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47266 |
| Comment by Gerrit Updater [ 11/May/22 ] |
|
"John L. Hammond <jhammond@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47298 |
| Comment by Gerrit Updater [ 19/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47298/ |
| Comment by Gerrit Updater [ 19/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47243/ |
| Comment by Gerrit Updater [ 20/May/22 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/47262/ |
| Comment by Peter Jones [ 20/May/22 ] |
|
Looks like the latest patches have landed |
| Comment by Gerrit Updater [ 23/Aug/22 ] |
|
"Etienne AUJAMES <eaujames@ddn.com>" uploaded a new patch: https://review.whamcloud.com/48311 |
| Comment by Andreas Dilger [ 21/Oct/22 ] |
|
sanity test_277 is being skipped on master because of this ticket, but it should really be referencing LU-12587. |
| Comment by Gerrit Updater [ 17/Jan/23 ] |
|
"Patrick Farrell <farr0186@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49653 |
| Comment by Gerrit Updater [ 17/Jan/23 ] |
|
"Patrick Farrell <farr0186@gmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/49654 |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50598 |
| Comment by Gerrit Updater [ 11/Apr/23 ] |
|
"Patrick Farrell <pfarrell@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50599 |
| Comment by Gerrit Updater [ 26/Apr/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49653/ |
| Comment by Gerrit Updater [ 09/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/49654/ |
| Comment by Gerrit Updater [ 20/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50598/ |
| Comment by Gerrit Updater [ 20/May/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/50599/ |