Details
-
Bug
-
Resolution: Fixed
-
Minor
-
None
-
None
-
3
-
9223372036854775807
Description
When three processes are keeping doing I/O on a same file on a client:
- A: buffered write
- B: buffered read
- C: direct I/O read
Process B will return -EIO on the newer kernel such as Ubuntu 2204.
After investigate the bug, the problem should be that:
The client has two overlapped PR DLM extent locks:
- L1 = <PR, [1M, 4M - 1]>
- L2 = <PR, [3M, 5M - 1]>
A reader process holds L1 and read data in range [3M, 4M -1].
L2 is being revoking due to the conflict access.
Then in the ->readpage() call:
It will unlock the page after the page is in Uptodate state.
And lock blocking callback for L2 will clear the Uptodate state and delete the page from cache.
Then in the old kernel, it will:
/* Start the actual read. The read will unlock the page. */ error = mapping->a_ops->readpage(filp, page); if (unlikely(error)) { if (error == AOP_TRUNCATED_PAGE) { put_page(page); error = 0; goto find_page; } goto readpage_error; } if (!PageUptodate(page)) { error = lock_page_killable(page); if (unlikely(error)) goto readpage_error; if (!PageUptodate(page)) { if (page->mapping == NULL) { =====> /* * invalidate_mapping_pages got it */ unlock_page(page); put_page(page); goto find_page; } unlock_page(page); shrink_readahead_size_eio(filp, ra); error = -EIO; goto readpage_error; } unlock_page(page); } goto page_ok;
It will check whether the page was truncated and deleted from page cache via page->mapping, if so, it will retry the page read.
But in the newer kernel, it deletes this check and retry.
It seems there is no any simple solution for this bug.
static int filemap_read_page(struct file *file, struct address_space *mapping, struct page *page) { int error; /* * A previous I/O error may have been due to temporary failures, * eg. multipath errors. PG_error will be set again if readpage * fails. */ ClearPageError(page); /* Start the actual read. The read will unlock the page. */ error = mapping->a_ops->readpage(file, page); if (error) return error; error = wait_on_page_locked_killable(page); if (error) return error; if (PageUptodate(page)) return 0; shrink_readahead_size_eio(&file->f_ra); return -EIO; }
According to the Linux Documents: Documentations/filkesystems/locking.rst:
Locking rules: All except set_page_dirty and freepage may block ====================== ======================== ========= =============== ops PageLocked(page) i_rwsem invalidate_lock ====================== ======================== ========= =============== writepage: yes, unlocks (see below) readpage: yes, unlocks shared writepages: set_page_dirty no readahead: yes, unlocks shared readpages: no shared write_begin: locks the page exclusive write_end: yes, unlocks exclusive bmap: invalidatepage: yes exclusive releasepage: yes freepage: yes direct_IO: isolate_page: yes migratepage: yes (both) putback_page: yes launder_page: yes is_partially_uptodate: yes error_remove_page: yes swap_activate: no swap_deactivate: no ====================== ======================== ========= =============== The filesystem must exclusively acquire invalidate_lock before invalidating page cache in truncate / hole punch path (and thus calling into ->invalidatepage) to block races between page cache invalidation and page cache filling functions (fault, read, ...).
The solution should be that:
we must hold exclusive invalidate_lock (mapping->invalidate_lock) (in the newer kernel) when remove pages from page cache caused by the revoking of the DLM extent lock protecting them.
Attachments
Issue Links
- is related to
-
LU-16030 sanity-pcc test_45: attach more than 2 time: 0
- Open