Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20212

FLR-EC: verify non-write RPCs do not block client

    XMLWordPrintable

Details

    Description

      The goal of FLR-EC is to allow a file to be accessible for reads when a file's OST object is inaccessible (for whatever reason), by reconstructing data from a parity object on another OST.

      If the file data can be reconstructed from parity in limited cases, but some other common operation (e.g. open() or stat()) is blocked by the inaccessible OST object then the operational benefits of FLR-EC are lost.

      Testing should be done with a variety of workloads that access the filesystem in a read-only manner to check that they do not become blocked by RPC timeout and retry.  For example, scanning and reading with "[lfs] find /mnt/testfs -size +1G -print0 | xargs -0 md5sum" or "grep -Rq test /mnt/testfs" or running "/mnt/testfs/executable" should not hang, and ideally should not pause more than a few seconds before the EC read reconstruction is activated.

      The stat() calls should be handled by the SOM xattr on the MDS to return the file size and blocks instead of sending RPCs to OSTs to fetch these attributes.

      LU-20211 contains a proposal to handle open(O_TRUNC) and truncate(0) in a way that would allow "read-only" EC to remove inaccessible OST objects from the file completely.

      statfs()/df already has a "lazy" mechanism that should timeout if the OST is not responsive, and LU-20200 proposes that "lfs df" also send OST_STATFS RPCs in parallel to avoid long sequential waits, though the latter is not critical functionality for most workloads

      Other file and filesystem access calls should be systematically reviewed and tested to ensure that file operations do not block, and fixed or the wait minimized if at all possible.

      Ideally, only write(), truncate(), and maybe fallocate() to missing OST objects would block access waiting on OST recovery.

      Attachments

        Issue Links

          Activity

            People

              wc-triage WC Triage
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: