Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10968

add coordinator bypass upcalls for HSM archive and remove

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 9223372036854775807

    Description

      HSM restore performance is often degraded by the presence of too many archive requests in the CDT llog or CT pipeline. Offer upcalls for archive and remove to be invoked on the MDT which allow bypassing of the coordinator and better scheduling of archives and removes.

      From the commit message on https://review.whamcloud.com/32212:

      This change provides an HSM upcall facility to optionally bypass the
      HSM coordinator (CDT) for archive and remove requests. (Release
      already bypasses the CDT and restore bypass is not supported by this
      change.)
      
      Requires updated MDT and a worker client. OSTs, compute nodes, and
      copytool nodes need not be updated.
      
      lctl set_param mdt.*.hsm.upcall_mask='ARCHIVE' # or 'ARCHIVE RESTORE', 'RESTORE', ''
      lctl set_param mdt.*.hsm.upcall_path=/.../lhsm_mdt_upcall # Full path.
      
      HSM requests whose action is set in the upcall_mask parameter will be
      diverted from the coordinator and handled by the executable specified
      by upcall_path. By default upcall_mask is empty which gives the normal
      HSM coordinator handling behavior.
      
      The upcall (to be supplied by the site) will be invoked by MDT RPC
      handler (runs on MDT as a root privileged process with an empty
      environment). Invocation will be of the form:
      
        /.../lhsm_mdt_upcall ACTION FSNAME ARCHIVE_ID FLAGS DATA FID...
      
      with one or more FIDs each as a separate argument. The upcall_path
      paramater can be set to the path of an arbitrary (site supplied)
      executable as long as it DTRT. The RPC handler will block until the
      upcall completes. So for safety/liveness the upcall should really not
      access Lustre. Instead the upcall should put the request in an
      off-Lustre persistent queue or database and then exit. The actions
      could be submitted to a job scheduler but care must be taken to ensure
      thatthis does not entail any Lustre operations. See comments in
      mdt_hsm_upcall().
      
      A separate process (called a "worker" and also to be supplied by the
      site) should read from that persistent queue and perform the
      actions. The worker process does what a copytool does but instead of
      listening on a KUC pipe for actions it reads form the queue. Like
      existing copytools it must interact with the Lustre and with the
      archive. The main difference (on the Lustre side) is that it uses
      slightly modified ioctls to handle the upcalled requests. To make it
      easier I added a new command ('lfs hsm_upcall') that manages the
      Lustre half of an upcalled action and a sample script
      lustre/utils/lhsm_worker_posix that handles the archive side (assuming
      a lhsmtool_posix archive layout). The idea is that 'lfs hsm_upcall'
      knows about Lustre and lhsm_worker_posix knows about the
      archive. Running
      
        lfs hsm_upcall lhsm_worker_posix ARCHIVE FSNAME ARCHIVE_ID FLAGS DATA FID...
      
      will do the following for each FID:
        1. Open the Lustre file to be archived specified by FID.
        2. Send an RPC (which bypasses the CDT) to the MDT to say that ARCHIVE is starting.
        3. Invoke
      
             lhsm_worker_posix ACTION FSNAME ARCHIVE_ID FLAGS DATA FID
      
           with stdin opened to the file to be archived.
        4. Wait for lhsm_worker_posix and send a ARCHIVE completion RPC
           (with the exit status of lhsm_worker_posix to the MDT).
        5. Close the file to be archived.
      
      Remove is handled similarly by without the open or close.
      
      See comments in lustre/utils/lhsm_worker_posix and lfs_hsm_upcall().
      
      This may seem like a lot of moving parts but internally HSM has a lot
      of parts and this was the cleanest way to decompose it that would
      offer the flexibility needed.
      

      Attachments

        Issue Links

          Activity

            [LU-10968] add coordinator bypass upcalls for HSM archive and remove

            yep, given some improvements, coordinatool looks like a really good solution.

            beevans Ben Evans (Inactive) added a comment - yep, given some improvements, coordinatool looks like a really good solution.

            Why even bother with kernel space at all then. If you want a pure user land solution then look at http://github.com/cea-hpc/coordinatool.

            This work is looking to improve what we are already using without creating a new interface.

            simmonsja James A Simmons added a comment - Why even bother with kernel space at all then. If you want a pure user land solution then look at http://github.com/cea-hpc/coordinatool. This work is looking to improve what we are already using without creating a new interface.

            I think 99% of this can be skipped by LU-13384 and using purely non-kernel calls.  lfs hsm ... calls just all get routed to the external coordinator (of whatever form).

            The only issue is the calls that perform imperative restore on file access.  I believe those can be easily added using a smaller chunk of the infrastructure in this PR.

            beevans Ben Evans (Inactive) added a comment - I think 99% of this can be skipped by LU-13384 and using purely non-kernel calls.  lfs hsm ... calls just all get routed to the external coordinator (of whatever form). The only issue is the calls that perform imperative restore on file access.  I believe those can be easily added using a smaller chunk of the infrastructure in this PR.

            Both HPE and Microsoft is interested in this work.

            simmonsja James A Simmons added a comment - Both HPE and Microsoft is interested in this work.
            spitzcor Cory Spitz added a comment -

            There are still two patches pending in Gerrit for this ticket. It is probably best not to abandon them. Granted it isn't a true dependency, but we've all been waiting for the netlink changes to finish refreshing and landing the HSM/data movement patches that would be impacted. It looks like https://review.whamcloud.com/#/c/34230 finally has all the necessary +1s and it is even in master-next now, so perhaps we can reopen and resume this work soon.

            spitzcor Cory Spitz added a comment - There are still two patches pending in Gerrit for this ticket. It is probably best not to abandon them. Granted it isn't a true dependency, but we've all been waiting for the netlink changes to finish refreshing and landing the HSM/data movement patches that would be impacted. It looks like https://review.whamcloud.com/#/c/34230 finally has all the necessary +1s and it is even in master-next now, so perhaps we can reopen and resume this work soon.
            jhammond John Hammond added a comment -

            Closing since this isn't being worked on.

            jhammond John Hammond added a comment - Closing since this isn't being worked on.
            spitzcor Cory Spitz added a comment -

            beevans, this is assigned to your old persona.

            spitzcor Cory Spitz added a comment - beevans , this is assigned to your old persona.

            Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/36492
            Subject: LU-10968 hsm: encapsulate copyaction_private
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: dfe56daacdb3cbf85b45eb2eccf98038a665c63c

            gerrit Gerrit Updater added a comment - Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/36492 Subject: LU-10968 hsm: encapsulate copyaction_private Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: dfe56daacdb3cbf85b45eb2eccf98038a665c63c

            Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/36235
            Subject: LU-10968 hsm: create external HSM queue interface
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ee8c7d8d925a3b21d29a5170524781ae9375618c

            gerrit Gerrit Updater added a comment - Ben Evans (bevans@cray.com) uploaded a new patch: https://review.whamcloud.com/36235 Subject: LU-10968 hsm: create external HSM queue interface Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ee8c7d8d925a3b21d29a5170524781ae9375618c

            To let you know I'm going to push another LU-9680 update. I have been talking to Amir about its application for LNet UDSP as well as using this for lnet selftest so I might move the netlink handling into liblnetconfig. I will rebase the LU-7659 patch on top of LU-9680 as well as push a early stats patch I developed which is not finished.

            simmonsja James A Simmons added a comment - To let you know I'm going to push another LU-9680 update. I have been talking to Amir about its application for LNet UDSP as well as using this for lnet selftest so I might move the netlink handling into liblnetconfig. I will rebase the LU-7659 patch on top of LU-9680 as well as push a early stats patch I developed which is not finished.

            People

              nangelinas Nikitas Angelinas
              jhammond John Hammond
              Votes:
              1 Vote for this issue
              Watchers:
              23 Start watching this issue

              Dates

                Created:
                Updated: