Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
9223372036854775807
Description
HPC burst buffers are a fast storage layer positioned between the compute engines and the backend storage systems.
There are two representative burst buffer architectures: remote shared burst buffers and node-local burst buffer. DataWarp and Infinite Memory Engine belong to the former. In the case of remote shared burst buffers, the SSD storage resides in I/O nodes positioned between the compute nodes and the backend storage. Data movement between compute nodes and the burst buffer needs to go through a network. Placing burst buffers in I/O nodes facilitates their independent development, deployment, and maintenance. The aggregate bandwidth of node local burst buffers grows linearly with the number of compute nodes. Node-local burst buffers
also require a scalable metadata management to maintain a global namespace across all nodes.
RW-PCC provides an elegant way to couple node-local burst buffers with Lustre. The metadata is managed by Lustre and stored on MDTs. Thus, it becomes part of the global Lustre namespace. Moreover, the file data can be migrated from the LPCC cache to the Lustre OSTs via file restores, and it is transparent to the application. Furthermore,we can customize various cache strategies and provide cache isolation according to files’ attributes.
Although the node-local PCC nearly does not occupy any network resrouce when perform data IO, but the capacity of the node-local PCC is limited by the storage media on this client.
A novel remote shread PCC for Lustre filesystem is proposed, which can be used as a remote shared burst buffer on a shared PCC backend fs. This shared PCC backend fs could be a high speed networked filesystem (i.e. another Lustre storage) using high speed NVMe or SSD while the current Lustre filesystem is minaly using slow speed HDDs.
By this way, all Lustre clients can use the shared PCC backend fs with larger capacity. And we can have 4 level storage tires for a single Lustre filesystem
- OST storage tire
- Original node-local PCC
- Remote shared PCC on a shared backend fs
- Traditional Lustre HSM solution
The implementation of remote shared PCC can use the Foundation and framework of current node-local PCC.
Moreover, Under the remote shared RO-PCC, once a file is attached into the shared PCC backend fs, it can sharely read from PCC by all clients.
For the remote shared RW-PCC, it works as original, and can only read/write by a single client.
Attachments
Issue Links
- is related to
-
LU-10606 HSM info as part of LOV layout xattr
-
- Open
-
>if there are mixed SSD and HDD OSTs in same Lustre namespace, FLR is more better way of writing data into SSD layer first, then migrate data to HDD OSTs with FLR mirror, no?
Compared with FLR, PCC can:
1) Transparently restore data from PCC into Lustre OSTs when it hit -ENOSPC or -EDQUOT error; while FLR on SSD can not tolerate this kind of failures, I think.
2)customize various cache strategies and provide cache isolation according to files’ attributes.
i.e. PCC can provide cache isolation mechanisms for administrators to manage how much PCC storage capacity each user/group/project can use.
FLR can not customize how much SSD space each user/group/project can use.
Pool-based quota maybe, but not in user/group/project dimension, I think.
Moreover, PCC can implement a job-base quota via poject quota on the PCC backend fs, I think.
We just need to add a mapping between the job identifier (i.e. job name) and a dedicated project ID (i.e. 100) of PCC backend fs.
1) Before the job starts, set this mapping and the project quota enforcement on the PCC backend fs;
2) when the job runs, at the time of attaching the file inot PCC, set the project ID (100) to the PCC copy to achive job-base quota.
3) when the job finishes, unmapping the relation between the job identifier and a project ID (i.e. 100) and remove the project quota enforcement associated with this project ID on the PCC backend fs.