Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10968

add coordinator bypass upcalls for HSM archive and remove

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Minor
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Rank (Obsolete):
      9223372036854775807

      Description

      HSM restore performance is often degraded by the presence of too many archive requests in the CDT llog or CT pipeline. Offer upcalls for archive and remove to be invoked on the MDT which allow bypassing of the coordinator and better scheduling of archives and removes.

      From the commit message on https://review.whamcloud.com/32212:

      This change provides an HSM upcall facility to optionally bypass the
      HSM coordinator (CDT) for archive and remove requests. (Release
      already bypasses the CDT and restore bypass is not supported by this
      change.)
      
      Requires updated MDT and a worker client. OSTs, compute nodes, and
      copytool nodes need not be updated.
      
      lctl set_param mdt.*.hsm.upcall_mask='ARCHIVE' # or 'ARCHIVE RESTORE', 'RESTORE', ''
      lctl set_param mdt.*.hsm.upcall_path=/.../lhsm_mdt_upcall # Full path.
      
      HSM requests whose action is set in the upcall_mask parameter will be
      diverted from the coordinator and handled by the executable specified
      by upcall_path. By default upcall_mask is empty which gives the normal
      HSM coordinator handling behavior.
      
      The upcall (to be supplied by the site) will be invoked by MDT RPC
      handler (runs on MDT as a root privileged process with an empty
      environment). Invocation will be of the form:
      
        /.../lhsm_mdt_upcall ACTION FSNAME ARCHIVE_ID FLAGS DATA FID...
      
      with one or more FIDs each as a separate argument. The upcall_path
      paramater can be set to the path of an arbitrary (site supplied)
      executable as long as it DTRT. The RPC handler will block until the
      upcall completes. So for safety/liveness the upcall should really not
      access Lustre. Instead the upcall should put the request in an
      off-Lustre persistent queue or database and then exit. The actions
      could be submitted to a job scheduler but care must be taken to ensure
      thatthis does not entail any Lustre operations. See comments in
      mdt_hsm_upcall().
      
      A separate process (called a "worker" and also to be supplied by the
      site) should read from that persistent queue and perform the
      actions. The worker process does what a copytool does but instead of
      listening on a KUC pipe for actions it reads form the queue. Like
      existing copytools it must interact with the Lustre and with the
      archive. The main difference (on the Lustre side) is that it uses
      slightly modified ioctls to handle the upcalled requests. To make it
      easier I added a new command ('lfs hsm_upcall') that manages the
      Lustre half of an upcalled action and a sample script
      lustre/utils/lhsm_worker_posix that handles the archive side (assuming
      a lhsmtool_posix archive layout). The idea is that 'lfs hsm_upcall'
      knows about Lustre and lhsm_worker_posix knows about the
      archive. Running
      
        lfs hsm_upcall lhsm_worker_posix ARCHIVE FSNAME ARCHIVE_ID FLAGS DATA FID...
      
      will do the following for each FID:
        1. Open the Lustre file to be archived specified by FID.
        2. Send an RPC (which bypasses the CDT) to the MDT to say that ARCHIVE is starting.
        3. Invoke
      
             lhsm_worker_posix ACTION FSNAME ARCHIVE_ID FLAGS DATA FID
      
           with stdin opened to the file to be archived.
        4. Wait for lhsm_worker_posix and send a ARCHIVE completion RPC
           (with the exit status of lhsm_worker_posix to the MDT).
        5. Close the file to be archived.
      
      Remove is handled similarly by without the open or close.
      
      See comments in lustre/utils/lhsm_worker_posix and lfs_hsm_upcall().
      
      This may seem like a lot of moving parts but internally HSM has a lot
      of parts and this was the cleanest way to decompose it that would
      offer the flexibility needed.
      

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                bevans Ben Evans
                Reporter:
                jhammond John Hammond (Inactive)
              • Votes:
                1 Vote for this issue
                Watchers:
                14 Start watching this issue

                Dates

                • Created:
                  Updated: