Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
9223372036854775807
Description
HSM restore performance is often degraded by the presence of too many archive requests in the CDT llog or CT pipeline. Offer upcalls for archive and remove to be invoked on the MDT which allow bypassing of the coordinator and better scheduling of archives and removes.
From the commit message on https://review.whamcloud.com/32212:
This change provides an HSM upcall facility to optionally bypass the
HSM coordinator (CDT) for archive and remove requests. (Release
already bypasses the CDT and restore bypass is not supported by this
change.)
Requires updated MDT and a worker client. OSTs, compute nodes, and
copytool nodes need not be updated.
lctl set_param mdt.*.hsm.upcall_mask='ARCHIVE' # or 'ARCHIVE RESTORE', 'RESTORE', ''
lctl set_param mdt.*.hsm.upcall_path=/.../lhsm_mdt_upcall # Full path.
HSM requests whose action is set in the upcall_mask parameter will be
diverted from the coordinator and handled by the executable specified
by upcall_path. By default upcall_mask is empty which gives the normal
HSM coordinator handling behavior.
The upcall (to be supplied by the site) will be invoked by MDT RPC
handler (runs on MDT as a root privileged process with an empty
environment). Invocation will be of the form:
/.../lhsm_mdt_upcall ACTION FSNAME ARCHIVE_ID FLAGS DATA FID...
with one or more FIDs each as a separate argument. The upcall_path
paramater can be set to the path of an arbitrary (site supplied)
executable as long as it DTRT. The RPC handler will block until the
upcall completes. So for safety/liveness the upcall should really not
access Lustre. Instead the upcall should put the request in an
off-Lustre persistent queue or database and then exit. The actions
could be submitted to a job scheduler but care must be taken to ensure
thatthis does not entail any Lustre operations. See comments in
mdt_hsm_upcall().
A separate process (called a "worker" and also to be supplied by the
site) should read from that persistent queue and perform the
actions. The worker process does what a copytool does but instead of
listening on a KUC pipe for actions it reads form the queue. Like
existing copytools it must interact with the Lustre and with the
archive. The main difference (on the Lustre side) is that it uses
slightly modified ioctls to handle the upcalled requests. To make it
easier I added a new command ('lfs hsm_upcall') that manages the
Lustre half of an upcalled action and a sample script
lustre/utils/lhsm_worker_posix that handles the archive side (assuming
a lhsmtool_posix archive layout). The idea is that 'lfs hsm_upcall'
knows about Lustre and lhsm_worker_posix knows about the
archive. Running
lfs hsm_upcall lhsm_worker_posix ARCHIVE FSNAME ARCHIVE_ID FLAGS DATA FID...
will do the following for each FID:
1. Open the Lustre file to be archived specified by FID.
2. Send an RPC (which bypasses the CDT) to the MDT to say that ARCHIVE is starting.
3. Invoke
lhsm_worker_posix ACTION FSNAME ARCHIVE_ID FLAGS DATA FID
with stdin opened to the file to be archived.
4. Wait for lhsm_worker_posix and send a ARCHIVE completion RPC
(with the exit status of lhsm_worker_posix to the MDT).
5. Close the file to be archived.
Remove is handled similarly by without the open or close.
See comments in lustre/utils/lhsm_worker_posix and lfs_hsm_upcall().
This may seem like a lot of moving parts but internally HSM has a lot
of parts and this was the cleanest way to decompose it that would
offer the flexibility needed.
Attachments
Issue Links
- is related to
-
LU-13384 HSM copytool API for external coordinator
-
- Open
-
-
LU-8324 HSM: prioritize HSM requests
-
- Open
-
-
LU-9680 Improve the user land to kernel space interface for lustre
-
- In Progress
-
-
LU-7659 Replace KUC by more standard mechanisms
-
- Reopened
-
- is related to
-
LU-6081 hsm: add file migrate support
-
- Open
-
- mentioned in
-
Page Loading...