Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20115

ll_readpage() leaks internal -ENODATA to external a_ops->read_folio callers, breaking EROFS file-backed mode

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.17.0
    • 2
    • 9223372036854775807

    Description

      ```

      Summary

      Lustre's ll_readpage() (the a_ops->read_folio implementation) returns -ENODATA when called without a pre-established cl_io context and the requested page is not already in the page cache. This -ENODATA is an internal signal used by Lustre's "fast read" mechanism, intended to be caught by ll_do_fast_read() and converted to a fallback to the slow I/O path.

      However, when external kernel subsystems call a_ops->read_folio directly (bypassing Lustre's f_op->read_iter), this internal error leaks out and causes failures. The most prominent affected use case is EROFS file-backed mode (introduced in Linux 6.14), which calls read_mapping_folio()a_ops->read_folio to read metadata from the backing file.

      This bug exists in the current Lustre master branch — the io == NULL branch in ll_readpage() has identical logic.

      Steps to Reproduce

      1. Store an EROFS image file on a Lustre filesystem (e.g., /mnt/lustre/images/layer.img)
      2. Attempt to mount it using EROFS file-backed mode:
        int ctx_fd = fsopen("erofs", 0);
        fsconfig(ctx_fd, FSCONFIG_SET_STRING, "source", "/mnt/lustre/images/layer.img", 0);
        fsconfig(ctx_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);  // ← fails with ENODATA
        
      3. Observe that fsconfig(CMD_CREATE) returns -ENODATA (errno 61).

      Interesting observation: If you first run md5sum /mnt/lustre/images/layer.img (which populates the page cache via the normal read() path), then the EROFS mount succeeds. After echo 3 > /proc/sys/vm/drop_caches, it fails again. This confirms the issue is related to page cache state.

      Root Cause Analysis

      The call chain that fails

      EROFS file-backed mode reads metadata via read_mapping_folio(), which directly invokes a_ops->read_folio:

      erofs_fc_fill_super()                          // fs/erofs/super.c
        → erofs_read_superblock()
          → erofs_read_metabuf()
            → erofs_bread()                          // fs/erofs/data.c
              → read_mapping_folio(mapping, index, file)
                → read_cache_folio()                 // include/linux/pagemap.h
                  → filemap_read_folio()             // mm/filemap.c
                    → a_ops->read_folio(file, folio)
                      → ll_readpage(file, vmpage)    // lustre/llite/rw.c
      

      The problematic code in ll_readpage()

      In lustre/llite/rw.c, the ll_readpage() function:

      int ll_readpage(struct file *file, struct page *vmpage)
      {
          // ...
          lcc = ll_cl_find(inode);   // Search for cl_io context on current task
          if (lcc != NULL) {
              env = lcc->lcc_env;
              io  = lcc->lcc_io;
          }
      
          if (io == NULL) { /* fast read */
              result = -ENODATA;
      
              page = cl_vmpage_page(vmpage, clob);
              if (page == NULL) {
                  unlock_page(vmpage);
                  // *** BUG: returns -ENODATA to ANY caller, including external ones ***
                  RETURN(result);
              }
              // ... only handles pages already in cl_page cache (fast read hit) ...
          }
      
          // io != NULL: full slow read path with cl_io context
          // ...
      }
      

      The cl_io context is established by ll_cl_add() inside ll_file_io_generic(), which is called from ll_file_read_iter() (i.e., f_op->read_iter). When a_ops->read_folio is called directly by an external consumer (like EROFS's read_mapping_folio()), no cl_io context exists, so ll_cl_find() returns NULL, io is NULL, and if the page is not already cached, -ENODATA is returned.

      Why -ENODATA is an internal signal, not a proper error

      In Lustre's normal read path, -ENODATA from ll_readpage() is caught by ll_do_fast_read() in lustre/llite/file.c:

      static ssize_t ll_do_fast_read(struct kiocb *iocb, struct iov_iter *iter)
      {
          result = generic_file_read_iter(iocb, iter);
          if (result == -ENODATA)
              result = 0;  // Convert to 0, causing fallback to ll_file_io_generic()
          return result;
      }
      

      This works for Lustre's own read path because ll_do_fast_read() acts as a sentinel. But external callers like EROFS have no knowledge of this convention and propagate -ENODATA as a real error.

      The architectural issue

      Lustre's address_space_operations are not self-contained. They depend on a cl_io context that is established at the file_operations level:

      ┌─────────────────────────────────────────────────────────────────┐
      │  file_operations (ll_file_read_iter)                            │
      │    → ll_file_io_generic()                                       │
      │      → ll_cl_add()          ← establishes cl_io context        │
      │      → cl_io_loop()                                             │
      │        → vvp_io_read_start()                                    │
      │          → generic_file_read_iter()                             │
      │            ↓                                                    │
      │  address_space_operations (ll_readpage)                          │
      │    → ll_cl_find()           ← requires pre-established context  │
      │    → if no context: return -ENODATA (fast read signal)          │
      └─────────────────────────────────────────────────────────────────┘
      

      Any caller that invokes a_ops->read_folio directly (bypassing f_op->read_iter) will hit this issue. This includes:

      • EROFS file-backed mode (read_mapping_folio() in fs/erofs/data.c)
      • Kernel module read_mapping_folio() calls
      • Any future in-kernel consumer of Lustre file page cache

      Impact

      1. EROFS file-backed mode (Linux 6.14+) cannot mount images stored on Lustre.
      2. Any in-kernel subsystem that calls read_mapping_folio() or read_cache_folio() on a Lustre file will get unexpected -ENODATA errors when the page is not cached.
      3. The workaround of pre-populating page cache (e.g., via md5sum) is fragile and breaks after drop_caches or memory pressure.

      Proposed Fix

      Make ll_readpage() self-contained when called without a cl_io context, similar to how ll_writepage() handles the writeback path (which also lacks a pre-established cl_io).

      ll_writepage() already demonstrates the pattern of creating a temporary cl_io context:

      // lustre/llite/rw.c - ll_writepage() (existing code)
      int ll_writepage(struct page *vmpage, struct writeback_control *wbc)
      {
          // ...
          env = cl_env_get(&refcheck);
          io = vvp_env_thread_io(env);
          io->ci_obj = clob;
          io->ci_ignore_layout = 1;
          result = cl_io_init(env, io, CIT_MISC, clob);
          if (result == 0) {
              page = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE);
              // ... perform I/O with temporary cl_io context ...
          }
          cl_io_fini(env, io);
          cl_env_put(env, &refcheck);
          // ...
      }
      

      The same approach should be applied to ll_readpage() in the io == NULL && page == NULL branch — create a temporary cl_io context to service the read request, so that external callers get a proper result instead of the internal -ENODATA signal.

      Considerations

      • Performance: This path is only taken when io == NULL AND the page is not cached. In Lustre's normal read path, ll_do_fast_read() catches -ENODATA and retries via ll_file_io_generic(). So this new code path would primarily be exercised by external callers.
      • Locking: CIT_MISC with ci_ignore_layout = 1 avoids layout lock acquisition, matching the ll_writepage() pattern.
      • Backward compatibility: The existing -ENODATA fast read signal path for Lustre's own I/O is unchanged.

      Related: Secondary issue with loop device

      The same root cause also manifests when using a loop device (buffered I/O mode) on top of a Lustre file. After drop_caches, the -ENODATA from ll_readpage() during the page_cache_sync_ra() phase can leave non-uptodate folios in the page cache, which may cause subtle issues in subsequent reads.
      ```

      Attachments

        Activity

          People

            wc-triage WC Triage
            chenhui Hui Chen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: