Details
-
Improvement
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
None
-
9223372036854775807
Description
HSM restore performance is often degraded by the presence of too many archive requests in the CDT llog or CT pipeline. Offer upcalls for archive and remove to be invoked on the MDT which allow bypassing of the coordinator and better scheduling of archives and removes.
From the commit message on https://review.whamcloud.com/32212:
This change provides an HSM upcall facility to optionally bypass the HSM coordinator (CDT) for archive and remove requests. (Release already bypasses the CDT and restore bypass is not supported by this change.) Requires updated MDT and a worker client. OSTs, compute nodes, and copytool nodes need not be updated. lctl set_param mdt.*.hsm.upcall_mask='ARCHIVE' # or 'ARCHIVE RESTORE', 'RESTORE', '' lctl set_param mdt.*.hsm.upcall_path=/.../lhsm_mdt_upcall # Full path. HSM requests whose action is set in the upcall_mask parameter will be diverted from the coordinator and handled by the executable specified by upcall_path. By default upcall_mask is empty which gives the normal HSM coordinator handling behavior. The upcall (to be supplied by the site) will be invoked by MDT RPC handler (runs on MDT as a root privileged process with an empty environment). Invocation will be of the form: /.../lhsm_mdt_upcall ACTION FSNAME ARCHIVE_ID FLAGS DATA FID... with one or more FIDs each as a separate argument. The upcall_path paramater can be set to the path of an arbitrary (site supplied) executable as long as it DTRT. The RPC handler will block until the upcall completes. So for safety/liveness the upcall should really not access Lustre. Instead the upcall should put the request in an off-Lustre persistent queue or database and then exit. The actions could be submitted to a job scheduler but care must be taken to ensure thatthis does not entail any Lustre operations. See comments in mdt_hsm_upcall(). A separate process (called a "worker" and also to be supplied by the site) should read from that persistent queue and perform the actions. The worker process does what a copytool does but instead of listening on a KUC pipe for actions it reads form the queue. Like existing copytools it must interact with the Lustre and with the archive. The main difference (on the Lustre side) is that it uses slightly modified ioctls to handle the upcalled requests. To make it easier I added a new command ('lfs hsm_upcall') that manages the Lustre half of an upcalled action and a sample script lustre/utils/lhsm_worker_posix that handles the archive side (assuming a lhsmtool_posix archive layout). The idea is that 'lfs hsm_upcall' knows about Lustre and lhsm_worker_posix knows about the archive. Running lfs hsm_upcall lhsm_worker_posix ARCHIVE FSNAME ARCHIVE_ID FLAGS DATA FID... will do the following for each FID: 1. Open the Lustre file to be archived specified by FID. 2. Send an RPC (which bypasses the CDT) to the MDT to say that ARCHIVE is starting. 3. Invoke lhsm_worker_posix ACTION FSNAME ARCHIVE_ID FLAGS DATA FID with stdin opened to the file to be archived. 4. Wait for lhsm_worker_posix and send a ARCHIVE completion RPC (with the exit status of lhsm_worker_posix to the MDT). 5. Close the file to be archived. Remove is handled similarly by without the open or close. See comments in lustre/utils/lhsm_worker_posix and lfs_hsm_upcall(). This may seem like a lot of moving parts but internally HSM has a lot of parts and this was the cleanest way to decompose it that would offer the flexibility needed.
Attachments
Issue Links
- is related to
-
LU-13384 HSM copytool API for external coordinator
- Open
-
LU-8324 HSM: prioritize HSM requests
- Open
-
LU-9680 Improve the user land to kernel space interface for lustre
- In Progress
-
LU-7659 Replace KUC by more standard mechanisms
- Reopened
- is related to
-
LU-6081 hsm: add file migrate support
- Open