Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-16651

hold invalidate_lock when invalidating page cache under kernel 5.15+

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When three processes are keeping doing I/O on a same file on a client:

      • A: buffered write
      • B: buffered read
      • C: direct I/O read
        Process B will return -EIO on the newer kernel such as Ubuntu 2204.

      After investigate the bug, the problem should be that:
      The client has two overlapped PR DLM extent locks:

      • L1 = <PR, [1M, 4M - 1]>
      • L2 = <PR, [3M, 5M - 1]>
        A reader process holds L1 and read data in range [3M, 4M -1].
        L2 is being revoking due to the conflict access.
        Then in the ->readpage() call:
        It will unlock the page after the page is in Uptodate state.
        And lock blocking callback for L2 will clear the Uptodate state and delete the page from cache.
        Then in the old kernel, it will:
      /* Start the actual read. The read will unlock the page. */
      		error = mapping->a_ops->readpage(filp, page);
      
      		if (unlikely(error)) {
      			if (error == AOP_TRUNCATED_PAGE) {
      				put_page(page);
      				error = 0;
      				goto find_page;
      			}
      			goto readpage_error;
      		}
      
      		if (!PageUptodate(page)) {
      			error = lock_page_killable(page);
      			if (unlikely(error))
      				goto readpage_error;
      			if (!PageUptodate(page)) {
      				if (page->mapping == NULL) { =====>
      					/*
      					 * invalidate_mapping_pages got it
      					 */
      					unlock_page(page);
      					put_page(page);
      					goto find_page;
      				}
      				unlock_page(page);
      				shrink_readahead_size_eio(filp, ra);
      				error = -EIO;
      				goto readpage_error;
      			}
      			unlock_page(page);
      		}
      
      		goto page_ok;
      

      It will check whether the page was truncated and deleted from page cache via page->mapping, if so, it will retry the page read.

      But in the newer kernel, it deletes this check and retry.
      It seems there is no any simple solution for this bug.

      static int filemap_read_page(struct file *file, struct address_space *mapping,
      		struct page *page)
      {
      	int error;
      
      	/*
      	 * A previous I/O error may have been due to temporary failures,
      	 * eg. multipath errors.  PG_error will be set again if readpage
      	 * fails.
      	 */
      	ClearPageError(page);
      	/* Start the actual read. The read will unlock the page. */
      	error = mapping->a_ops->readpage(file, page);
      	if (error)
      		return error;
      
      	error = wait_on_page_locked_killable(page);
      	if (error)
      		return error;
      	if (PageUptodate(page))
      		return 0;
      
      	shrink_readahead_size_eio(&file->f_ra);
      	return -EIO;
      }
      

      According to the Linux Documents: Documentations/filkesystems/locking.rst:

      Locking rules:
      	All except set_page_dirty and freepage may block
      
      ======================	======================== =========	===============
      ops			PageLocked(page)	 i_rwsem	invalidate_lock
      ======================	======================== =========	===============
      writepage:		yes, unlocks (see below)
      readpage:		yes, unlocks				shared
      writepages:
      set_page_dirty		no
      readahead:		yes, unlocks				shared
      readpages:		no					shared
      write_begin:		locks the page		 exclusive
      write_end:		yes, unlocks		 exclusive
      bmap:
      invalidatepage:		yes					exclusive
      releasepage:		yes
      freepage:		yes
      direct_IO:
      isolate_page:		yes
      migratepage:		yes (both)
      putback_page:		yes
      launder_page:		yes
      is_partially_uptodate:	yes
      error_remove_page:	yes
      swap_activate:		no
      swap_deactivate:	no
      ======================	======================== =========	===============
      The filesystem must exclusively acquire invalidate_lock before invalidating page cache in truncate / hole punch path (and thus calling into ->invalidatepage) to block races between page cache invalidation and page cache filling functions (fault, read, ...).
      

      The solution should be that:
      we must hold exclusive invalidate_lock (mapping->invalidate_lock) (in the newer kernel) when remove pages from page cache caused by the revoking of the DLM extent lock protecting them.

      Attachments

        Issue Links

          Activity

            People

              qian_wc Qian Yingjin
              qian_wc Qian Yingjin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: