Loading...

XML

Word

Printable

Details

Type: Epic
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: None
Labels:
- FLR2

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

Overview

Erasure coding provides a more space-efficient method for adding data redundancy than mirroring, at a somewhat higher computational cost. This would typically be used for adding redundancy for large and longer-lived files to minimize space overhead. For example, RAID-6 10+2 adds only 20% space overhead while allowing two OST failures, compared to mirroring which adds 100% overhead for single-failure redundancy or 200% overhead for double-failure redundancy. Erasure coding can add redundancy for an arbitrary number of drive failures (e.g. any 3 drives in a group of 16) with a fraction of the overhead.

It would be possible to implement delayed erasure coding on striped files in a similar manner to Phase 1 mirrored files, by storing the parity stripes in a separate component in the file, having a layout that indicates the erasure coding algorithm, number of data and parity stripes, stripe_size (should probably match file stripe size), etc. The encoding would be similar to RAID-4, with specific "data" stripes (the traditional Lustre RAID-0 file layout) in the primary component, and one or more "parity" stripes stored in a separate parity component, unlike RAID-5/6 that have the parity interleaved. For widely-striped files, there could be separate parity stripes for different sets of file stripes (e.g. 10x 12+3 for a 120-stripe file), so that data+parity would be able to use all of the OSTs in the filesystem without having double failures within a single parity group. For very large files, it would be possible to split the parity component into smaller extents to reduce the parity reconstruction overhead for sub-file overwrites. Erasure coding could also be added after-the-fact to existing RAID-0 striped files, after the initial file write, or when migrating a file from an active storage tier to an archive tier.

Reads from an erasure-coded file would normally use only the primary RAID-0 component (unless data verification on read was also desired), as with non-redundant files. If a stripe in the primary component for the file fails, the client would read the data stripes and one or more parity stripes component and reconstruct the data from parity on the fly, and/or depend on the resync tool to reconstruct the failed stripe from parity.

Writes to an erasure-coded file would mark the parity component stale matching the extent of the data component that was modified, as with a regular mirrored file, and writes would continue on the primary RAID-0 striped file. The main difference from an FLR mirrored file is that the writes would always need to go to the primary data component, and the parity component would always be marked stale. It would not be possible to write to an erasure-coded file that has a failure in a primary stripe without first reconstructing it from parity.

Space Efficient Data Redundancy

Erasure coding will add the ability to add full redundancy of large files or whole filesystems, rather than using full mirroring. This will allow striped Lustre files to store redundancy in parity components that allow recovery from a specified number of OST failures (e.g. 3 OST failures per 12 stripes, or 4 OST failures per 24 stripes) in a manner similar to RAID-4 with fixed parity stripes.

Required Lustre Functionality

Erasure Coded File Read

The actual parity generation will be done with the lfs mirror resync tool in userspace. The Lustre client will do normal reads from the RAID-0 data component, unless there is an OST failure or other error reading from a data stripe. Add support for data reconstruction from the data and parity components, leveraging existing functionality for reading mirrored files.

Erasure Coded File Write

To avoid losing redundancy on erasure-coded files that are modified, the Mirrored File Writes functionality could be used during writes to such files. Changes would be merged into the erasure coded component after the file is closed, using the Phase 1 ChangeLog consumer, and then the mirror component can be dropped.

External Components

Erasure Coded Resync Tool

The lfs mirror resync tool needs to be updated to generate the erasure code for the file striped file, storing the parity in a separate component from the main RAID-0 striped file. There are CPU-optimized implementations of the erasure coding algorithms available, so the majority of the work would be integrating these optimized routines into the Lustre kernel modules and userspace tools, rather than actually developing the encoding algorithms.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Erasure Coding HDL.docx
16/Apr/19 8:49 AM
57 kB
Zhenyu Xu

Issue Links

is related to

LU-12649 Tracker for ongoing FLR improvements

Open

LU-13643 FLR3-IM: Immediate file write mirroring

Open

LU-19562 FLR-EC: Add connect flag support and enable/disable

In Progress

LUDOC-463 Add feature documentation for Erasure Coding

Open

LU-19298 Improve PFL handling in LOD / LOV layer

Open

LU-19066 FLR2: identify rack/PSU failure domains for servers

In Progress

LU-16837 interop: client skip unknown component mirror

Resolved

is related to

LU-9961 FLR-EC: Relocating individual objects to a new OST

Open

LU-19100 FLR-EC: restripe PFL components when adding EC

Open

(2 is related to, 2 is related to )

Sub-Tasks

Progress

1.	FLR-EC: add necessary structure to adopt erasure coding layout	Resolved	Zhenyu Xu
2.	FLR-EC: erasure coding layout handling	Open	Zhenyu Xu
3.	FLR-EC: Parity stripe count from data stripe count	Open	Ronnie Sahlberg
4.	FLR-EC: Basic do-no-harm lov IO support	Open	Zhenyu Xu
5.	FLR-EC: Implement FLR state transition logic for EC files	Resolved	Patrick Farrell
6.	FLR-EC: Direct EC component read/write	Resolved	Patrick Farrell
7.	FLR-EC: Add lfs ec resync and lfs ec verify commands	Resolved	WC Triage
8.	FLR-EC: import ISA-L library in Lustre build	Resolved	James A Simmons
9.	FLR-EC: resync parity components	In Progress	Ronnie Sahlberg
10.	FLR-EC: recover data from parity code	Open	Zhenyu Xu
11.	FLR-EC: Add/modify conf-sanity test_32 for erasure coding	Open	WC Triage
12.	FLR-EC: Don't read parity components on old clients	Open	Zhenyu Xu
13.	FLR-EC: Prevent stranding of parity mirror	Open	Zhenyu Xu
14.	FLR-EC: Never select EC mirror as primary for write or allow setting PREFER flags	In Progress	Patrick Farrell
15.	FLR-EC: Test FLR state transitions and mirror read/write for EC files	In Progress	Patrick Farrell
16.	FLR-EC: Tight binding between erasure code and parity mirrors	Open	Zhenyu Xu
17.	FLR-EC: lfs setstripe support for erasure coding	In Progress	Patrick Farrell
18.	FLR-EC: support for other lfs mirror commands	Open	WC Triage
19.	FLR-EC: Add connect flag support and enable/disable	In Progress	Patrick Farrell
20.	FLR-EC: mark EC OST objects for LFSCK, rebuild EC components	Open	WC Triage

Activity

People

Assignee:: Patrick Farrell

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 13/Apr/18 8:19 AM

Updated:: 3 days ago 4:57 PM