Loading...

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: Lustre 2.17.0
Labels:
- erofs
- file-backed-mode
- llite
- readpage
Environment:

Hide
Kernel: 6.14.0 (Ubuntu linux-hwe-6.14, also reproducible on mainline 6.14+)
Lustre: master branch (lustre/llite/rw.c, ll_readpage function)
Backing filesystem: EROFS image file stored on Lustre mount
EROFS mount method: file-backed mode (new in kernel 6.14, via fsconfig() syscalls)
Test node OS: Ubuntu 24.04 with HWE kernel 6.14

Show
Kernel: 6.14.0 (Ubuntu linux-hwe-6.14, also reproducible on mainline 6.14+) Lustre: master branch (lustre/llite/rw.c, ll_readpage function) Backing filesystem: EROFS image file stored on Lustre mount EROFS mount method: file-backed mode (new in kernel 6.14, via fsconfig() syscalls) Test node OS: Ubuntu 24.04 with HWE kernel 6.14

Severity:
2
Rank (Obsolete):
9223372036854775807

```

Summary

Lustre's ll_readpage() (the a_ops->read_folio implementation) returns -ENODATA when called without a pre-established cl_io context and the requested page is not already in the page cache. This -ENODATA is an internal signal used by Lustre's "fast read" mechanism, intended to be caught by ll_do_fast_read() and converted to a fallback to the slow I/O path.

However, when external kernel subsystems call a_ops->read_folio directly (bypassing Lustre's f_op->read_iter), this internal error leaks out and causes failures. The most prominent affected use case is EROFS file-backed mode (introduced in Linux 6.14), which calls read_mapping_folio() → a_ops->read_folio to read metadata from the backing file.

This bug exists in the current Lustre master branch — the io == NULL branch in ll_readpage() has identical logic.

Steps to Reproduce

Store an EROFS image file on a Lustre filesystem (e.g., /mnt/lustre/images/layer.img)

Attempt to mount it using EROFS file-backed mode:

int ctx_fd = fsopen("erofs", 0);
fsconfig(ctx_fd, FSCONFIG_SET_STRING, "source", "/mnt/lustre/images/layer.img", 0);
fsconfig(ctx_fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);  // ← fails with ENODATA

Observe that fsconfig(CMD_CREATE) returns -ENODATA (errno 61).

Interesting observation: If you first run md5sum /mnt/lustre/images/layer.img (which populates the page cache via the normal read() path), then the EROFS mount succeeds. After echo 3 > /proc/sys/vm/drop_caches, it fails again. This confirms the issue is related to page cache state.

Root Cause Analysis

The call chain that fails

EROFS file-backed mode reads metadata via read_mapping_folio(), which directly invokes a_ops->read_folio:

erofs_fc_fill_super()                          // fs/erofs/super.c
  → erofs_read_superblock()
    → erofs_read_metabuf()
      → erofs_bread()                          // fs/erofs/data.c
        → read_mapping_folio(mapping, index, file)
          → read_cache_folio()                 // include/linux/pagemap.h
            → filemap_read_folio()             // mm/filemap.c
              → a_ops->read_folio(file, folio)
                → ll_readpage(file, vmpage)    // lustre/llite/rw.c

The problematic code in ll_readpage()

In lustre/llite/rw.c, the ll_readpage() function:

int ll_readpage(struct file *file, struct page *vmpage)
{
    // ...
    lcc = ll_cl_find(inode);   // Search for cl_io context on current task
    if (lcc != NULL) {
        env = lcc->lcc_env;
        io  = lcc->lcc_io;
    }

    if (io == NULL) { /* fast read */
        result = -ENODATA;

        page = cl_vmpage_page(vmpage, clob);
        if (page == NULL) {
            unlock_page(vmpage);
            // *** BUG: returns -ENODATA to ANY caller, including external ones ***
            RETURN(result);
        }
        // ... only handles pages already in cl_page cache (fast read hit) ...
    }

    // io != NULL: full slow read path with cl_io context
    // ...
}

The cl_io context is established by ll_cl_add() inside ll_file_io_generic(), which is called from ll_file_read_iter() (i.e., f_op->read_iter). When a_ops->read_folio is called directly by an external consumer (like EROFS's read_mapping_folio()), no cl_io context exists, so ll_cl_find() returns NULL, io is NULL, and if the page is not already cached, -ENODATA is returned.

Why -ENODATA is an internal signal, not a proper error

In Lustre's normal read path, -ENODATA from ll_readpage() is caught by ll_do_fast_read() in lustre/llite/file.c:

static ssize_t ll_do_fast_read(struct kiocb *iocb, struct iov_iter *iter)
{
    result = generic_file_read_iter(iocb, iter);
    if (result == -ENODATA)
        result = 0;  // Convert to 0, causing fallback to ll_file_io_generic()
    return result;
}

This works for Lustre's own read path because ll_do_fast_read() acts as a sentinel. But external callers like EROFS have no knowledge of this convention and propagate -ENODATA as a real error.

The architectural issue

Lustre's address_space_operations are not self-contained. They depend on a cl_io context that is established at the file_operations level:

┌─────────────────────────────────────────────────────────────────┐
│  file_operations (ll_file_read_iter)                            │
│    → ll_file_io_generic()                                       │
│      → ll_cl_add()          ← establishes cl_io context        │
│      → cl_io_loop()                                             │
│        → vvp_io_read_start()                                    │
│          → generic_file_read_iter()                             │
│            ↓                                                    │
│  address_space_operations (ll_readpage)                          │
│    → ll_cl_find()           ← requires pre-established context  │
│    → if no context: return -ENODATA (fast read signal)          │
└─────────────────────────────────────────────────────────────────┘

Any caller that invokes a_ops->read_folio directly (bypassing f_op->read_iter) will hit this issue. This includes:

EROFS file-backed mode (read_mapping_folio() in fs/erofs/data.c)
Kernel module read_mapping_folio() calls
Any future in-kernel consumer of Lustre file page cache

Impact

EROFS file-backed mode (Linux 6.14+) cannot mount images stored on Lustre.
Any in-kernel subsystem that calls read_mapping_folio() or read_cache_folio() on a Lustre file will get unexpected -ENODATA errors when the page is not cached.
The workaround of pre-populating page cache (e.g., via md5sum) is fragile and breaks after drop_caches or memory pressure.

Proposed Fix

Make ll_readpage() self-contained when called without a cl_io context, similar to how ll_writepage() handles the writeback path (which also lacks a pre-established cl_io).

ll_writepage() already demonstrates the pattern of creating a temporary cl_io context:

// lustre/llite/rw.c - ll_writepage() (existing code)
int ll_writepage(struct page *vmpage, struct writeback_control *wbc)
{
    // ...
    env = cl_env_get(&refcheck);
    io = vvp_env_thread_io(env);
    io->ci_obj = clob;
    io->ci_ignore_layout = 1;
    result = cl_io_init(env, io, CIT_MISC, clob);
    if (result == 0) {
        page = cl_page_find(env, clob, vmpage->index, vmpage, CPT_CACHEABLE);
        // ... perform I/O with temporary cl_io context ...
    }
    cl_io_fini(env, io);
    cl_env_put(env, &refcheck);
    // ...
}

The same approach should be applied to ll_readpage() in the io == NULL && page == NULL branch — create a temporary cl_io context to service the read request, so that external callers get a proper result instead of the internal -ENODATA signal.

Considerations

Performance: This path is only taken when io == NULL AND the page is not cached. In Lustre's normal read path, ll_do_fast_read() catches -ENODATA and retries via ll_file_io_generic(). So this new code path would primarily be exercised by external callers.
Locking: CIT_MISC with ci_ignore_layout = 1 avoids layout lock acquisition, matching the ll_writepage() pattern.
Backward compatibility: The existing -ENODATA fast read signal path for Lustre's own I/O is unchanged.

Related: Secondary issue with loop device

The same root cause also manifests when using a loop device (buffered I/O mode) on top of a Lustre file. After drop_caches, the -ENODATA from ll_readpage() during the page_cache_sync_ra() phase can leave non-uptodate folios in the page cache, which may cause subtle issues in subsequent reads.
```

Details

Description