Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
None
-
3
-
9223372036854775807
Description
osc_queue_sync_pages() add osc_extent to osc_object's IO extent list without taking ldlm locks, and then it calls osc_io_unplug_async() to queue the IO work for the client.
I think the IO extent should take ldlm locks while waiting in the IO work queue.
Attachments
Issue Links
- is related to
-
LU-16401 various crashes with cl_page_discard vs readhead race
-
- Open
-
-
LU-16224 rw_seq_cst_vs_drop_caches dies with SIGBUS
-
- Resolved
-
- is related to
-
LU-16156 stale read during IOR test due LU-14541
-
- Open
-
-
LU-19254 Kernel NFS exported Lustre can give spurious EOF
-
- Open
-
-
LU-14541 Memory reclaim caused a stale data read
-
- Resolved
-
-
LU-15815 fast_read/stale data/reclaim workround causes SIGBUS
-
- Resolved
-
The original corruption problem is described in
LU-14541, though it is hard to read.LU-14541adds the removing uptodateThat fix was reverted due to SIGBUS/EIO, which was fixed in
LU-16160, andLU-16160puts the removing uptodate fix back.Here is the data corruption case:
The reason we need to remove uptodate is like this:
We have a sequence like this:
READ_1_START <-- NODE1
release_page() called
WRITE[START,END] <-- NODE2
READ_2_START <-- NODE1
READ_1_END
READ_2_END
(the order of READ_1_END and READ_2_END is not important)
So, READ_1 concurrent with WRITE, READ_2 concurrent with READ_1 but after WRITE
It is read2 which gets stale data, not seeing the write. (Read1 also does not see the write, but that is OK since the write and read1 overlap in time.)
Here's what happens:
Memory pressure flushes page (calls releasepage) while READ_1 is pending
Lustre must remove its knowledge of the page, since the page is about to be destroyed - calling cl_page_delete()
But page is still in the mapping (We cannot remove pages from the mapping in releasepage - the kernel removes the page a moment later and it assumes + asserts the page is still in the mapping)
Now the page is no longer tracked by Lustre, but it is still findable in the mapping until it is removed, which happens later.
At this point, the write starts on another node.
Lustre write does not flush this page because Lustre is not responsible for the page any more and cannot find it
Write completes (read1 is still in 'gap' between releasepage and remove_from_mapping)
Read2() starts and finds pages kept alive by read1
Read1() finishes without data from write (ok)
Read2() finishes without data from write (not OK)
Read2 starts after the write ends, and reads from pages kept alive by Read1.
If cl_page_delete removes uptodate, this problem is avoided. However, read_iter and mmap_fault return errors (EIO or SIGBUS) if a !uptodate page is found. That is why there is the seqlock wrapper to retry those