[LU-15819] Executables run from Lustre may receive spurious SIGBUS signals Created: 04/May/22  Updated: 05/May/22  Resolved: 04/May/22

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Upstream
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Andrew Perepechko Assignee: Andrew Perepechko
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-15815 fast_read/stale data/reclaim workroun... Resolved
Related
is related to LU-14541 Memory reclaim caused a stale data read Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

We received several reports about applications (IOR and other unrelated user-provided programs) started from Lustre receiving SIGBUS signals.

 

We were able to reproduce the issue with IOR, RHEL7 kernel on the client.

 

It seems that it is caused by LU-14541 and the mechanics is the following:

 

1) a major fault in the IOR code happens

2) ll_fault()>...>filemap_fault()

3) ll_readpage() is issued from filemap_fault()

4) wait_on_page_locked() is issued from filemap_fault()

5) the uptodate check in filemap_fault() fails due to a parallel ClearPageUptodate() called from a blocking AST handler

6) VM_FAULT_SIGBUS is returned



 Comments   
Comment by Peter Jones [ 04/May/22 ]

Panda

Isn't this a duplicate of LU-15815?

Peter

Comment by Andrew Perepechko [ 04/May/22 ]

Peter, yes, it is. Sorry, I searched for similar bugs some time ago but forgot to do it right before opening the ticket. We can close this one.

Comment by Andreas Dilger [ 04/May/22 ]

While this definitely seems like a duplicate of LU-15815, is anyone from HPE currently looking at this issue? There were comments in LU-14541 that Shadow was looking into this same issue:

This patch is tested already - it solve a problem, but some problems exist.
1) page isn't freed and stay in memory for long time until page cache LRU will flush it.
2) page without uptodate flag may cause a EIO in some cases, a specially with splice. Don't sure - but possible.

I have a different patch with change a cl_page states change to avoid own a CPS_FREED pages, but no resources to verify it.
in our case, it reproduced with overstripe with 5000 stripes and sysctl -w vm.drop_caches=3 on client nodes in parallel to the IOR.

It looks like John Hammond has an MPI reproducer on LU-15815, and an even simpler reproducer on LU-14541 that can be run on a single client, so that should allow testing potential fixes much more easily.

This issue is one of the few 2.15.0 blockers that does not have a fix yet.

Comment by Andrew Perepechko [ 04/May/22 ]

We discussed a few possible solutions specifically for this (LU-15815/LU-15819) issue.

E.g. we could add the page->mapping == NULL check like the kernel does in do_generic_file_read() to work around invalidate_mapping_pages(). However, we want to avoid copying filemap_fault() implementations for every supported kernel.

 

As for alternative fixes for LU-14541 such as mentioned by alyashkov, I believe, this kind of fix was too complicated and was abandoned in favour of ClearPageUptodate(). While there was understanding that it was not fully legitimate to call ClearPageUptodate(), the page fault path was not considered and properly tested. I asked him to put a comment as to why the patch was abandoned and whether it can be restored and finished.

Comment by John Hammond [ 05/May/22 ]

> It looks like John Hammond has an MPI reproducer on LU-15815, and an even simpler reproducer on LU-14541 that can be run on a single client, so that should allow testing potential fixes much more easily.

The MPI reproducer does not require multiple clients. It only requires 2 procs. You can use oversubscribe to place them both on the same node. It's unlikely that it has much to do with MPI it's just that the application generates the right memory map access pattern to reproduce the bug. I did try a bit to make something simpler but I not successful. It would be good to have a simplified reproducer.

Generated at Sat Feb 10 03:21:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.