Details
-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
Lustre 2.17.0
-
Kernel: 6.14.0 (Ubuntu linux-hwe-6.14, also reproducible on mainline 6.14+)
Lustre: master branch (lustre/llite/rw.c, ll_readpage function)
Backing filesystem: EROFS image file stored on Lustre mount
EROFS mount method: file-backed mode (new in kernel 6.14, via fsconfig() syscalls)
Test node OS: Ubuntu 24.04 with HWE kernel 6.14Kernel: 6.14.0 (Ubuntu linux-hwe-6.14, also reproducible on mainline 6.14+) Lustre: master branch (lustre/llite/rw.c, ll_readpage function) Backing filesystem: EROFS image file stored on Lustre mount EROFS mount method: file-backed mode (new in kernel 6.14, via fsconfig() syscalls) Test node OS: Ubuntu 24.04 with HWE kernel 6.14
-
2
-
9223372036854775807
Description
```
Summary
Lustre's ll_readpage() (the a_ops->read_folio implementation) returns -ENODATA when called without a pre-established cl_io context and the requested page is not already in the page cache. This -ENODATA is an internal signal used by Lustre's "fast read" mechanism, intended to be caught by ll_do_fast_read() and converted to a fallback to the slow I/O path.
However, when external kernel subsystems call a_ops->read_folio directly (bypassing Lustre's f_op->read_iter), this internal error leaks out and causes failures. The most prominent affected use case is EROFS file-backed mode (introduced in Linux 6.14), which calls read_mapping_folio() → a_ops->read_folio to read metadata from the backing file.
This bug exists in the current Lustre master branch — the io == NULL branch in ll_readpage() has identical logic.
Steps to Reproduce
- Store an EROFS image file on a Lustre filesystem (e.g., /mnt/lustre/images/layer.img)
- Attempt to mount it using EROFS file-backed mode:
int ctx_fd = fsopen("erofs", 0); fsconfig(ctx_fd, FSCONFIG_SET_STRING, "source", "/mnt/lustre/images/layer.img", 0); fsconfig(ctx_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); // ← fails with ENODATA
- Observe that fsconfig(CMD_CREATE) returns -ENODATA (errno 61).
Interesting observation: If you first run md5sum /mnt/lustre/images/layer.img (which populates the page cache via the normal read() path), then the EROFS mount succeeds. After echo 3 > /proc/sys/vm/drop_caches, it fails again. This confirms the issue is related to page cache state.
Root Cause Analysis
The call chain that fails
EROFS file-backed mode reads metadata via read_mapping_folio(), which directly invokes a_ops->read_folio:
erofs_fc_fill_super() // fs/erofs/super.c
→ erofs_read_superblock()
→ erofs_read_metabuf()
→ erofs_bread() // fs/erofs/data.c
→ read_mapping_folio(mapping, index, file)
→ read_cache_folio() // include/linux/pagemap.h
→ filemap_read_folio() // mm/filemap.c
→ a_ops->read_folio(file, folio)
→ ll_readpage(file, vmpage) // lustre/llite/rw.c
The problematic code in ll_readpage()
In lustre/llite/rw.c, the ll_readpage() function:
int ll_readpage(struct file *file, struct page *vmpage) { // ... lcc = ll_cl_find(inode); // Search for cl_io context on current task if (lcc != NULL) { env = lcc->lcc_env; io = lcc->lcc_io; } if (io == NULL) { /* fast read */ result = -ENODATA; page = cl_vmpage_page(vmpage, clob); if (page == NULL) { unlock_page(vmpage); // *** BUG: returns -ENODATA to ANY caller, including external ones *** RETURN(result); } // ... only handles pages already in cl_page cache (fast read hit) ... } // io != NULL: full slow read path with cl_io context // ... }
The cl_io context is established by ll_cl_add() inside ll_file_io_generic(), which is called from ll_file_read_iter() (i.e., f_op->read_iter). When a_ops->read_folio is called directly by an external consumer (like EROFS's read_mapping_folio()), no cl_io context exists, so ll_cl_find() returns NULL, io is NULL, and if the page is not already cached, -ENODATA is returned.
Why -ENODATA is an internal signal, not a proper error
In Lustre's normal read path, -ENODATA from ll_readpage() is caught by ll_do_fast_read() in lustre/llite/file.c:
static ssize_t ll_do_fast_read(struct kiocb *iocb, struct iov_iter *iter) { result = generic_file_read_iter(iocb, iter); if (result == -ENODATA) result = 0; // Convert to 0, causing fallback to ll_file_io_generic() return result; }
This works for Lustre's own read path because ll_do_fast_read() acts as a sentinel. But external callers like EROFS have no knowledge of this convention and propagate -ENODATA as a real error.
The architectural issue
Lustre's address_space_operations are not self-contained. They depend on a cl_io context that is established at the file_operations level:
┌─────────────────────────────────────────────────────────────────┐ │ file_operations (ll_file_read_iter) │ │ → ll_file_io_generic() │ │ → ll_cl_add() ← establishes cl_io context │ │ → cl_io_loop() │ │ → vvp_io_read_start() │ │ → generic_file_read_iter() │ │ ↓ │ │ address_space_operations (ll_readpage) │ │ → ll_cl_find() ← requires pre-established context │ │ → if no context: return -ENODATA (fast read signal) │ └─────────────────────────────────────────────────────────────────┘
Any caller that invokes a_ops->read_folio directly (bypassing f_op->read_iter) will hit this issue. This includes:
- EROFS file-backed mode (read_mapping_folio() in fs/erofs/data.c)
- Kernel module read_mapping_folio() calls
- Any future in-kernel consumer of Lustre file page cache
Impact
- EROFS file-backed mode (Linux 6.14+) cannot mount images stored on Lustre.
- Any in-kernel subsystem that calls read_mapping_folio() or read_cache_folio() on a Lustre file will get unexpected -ENODATA errors when the page is not cached.
- The workaround of pre-populating page cache (e.g., via md5sum) is fragile and breaks after drop_caches or memory pressure.
Proposed Fix
Make ll_readpage() self-contained when called without a cl_io context, similar to how ll_writepage() handles the writeback path (which also lacks a pre-established cl_io).
ll_writepage() already demonstrates the pattern of creating a temporary cl_io context:
// lustre/llite/rw.c - ll_writepage() (existing code) int ll_writepage(struct page *vmpage, struct writeback_control *wbc) { // ... env = cl_env_get(&refcheck); io = vvp_env_thread_io(env); io->ci_obj = clob; io->ci_ignore_layout = 1; result = cl_io_init(env, io, CIT_MISC, clob); if (result == 0) { page = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE); // ... perform I/O with temporary cl_io context ... } cl_io_fini(env, io); cl_env_put(env, &refcheck); // ... }
The same approach should be applied to ll_readpage() in the io == NULL && page == NULL branch — create a temporary cl_io context to service the read request, so that external callers get a proper result instead of the internal -ENODATA signal.
Considerations
- Performance: This path is only taken when io == NULL AND the page is not cached. In Lustre's normal read path, ll_do_fast_read() catches -ENODATA and retries via ll_file_io_generic(). So this new code path would primarily be exercised by external callers.
- Locking: CIT_MISC with ci_ignore_layout = 1 avoids layout lock acquisition, matching the ll_writepage() pattern.
- Backward compatibility: The existing -ENODATA fast read signal path for Lustre's own I/O is unchanged.
Related: Secondary issue with loop device
The same root cause also manifests when using a loop device (buffered I/O mode) on top of a Lustre file. After drop_caches, the -ENODATA from ll_readpage() during the page_cache_sync_ra() phase can leave non-uptodate folios in the page cache, which may cause subtle issues in subsequent reads.
```