<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:39:47 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10968] add coordinator bypass upcalls for HSM archive and remove</title>
                <link>https://jira.whamcloud.com/browse/LU-10968</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;HSM restore performance is often degraded by the presence of too many archive requests in the CDT llog or CT pipeline. Offer upcalls for archive and remove to be invoked on the MDT which allow bypassing of the coordinator and better scheduling of archives and removes.&lt;/p&gt;

&lt;p&gt;From the commit message on &lt;a href=&quot;https://review.whamcloud.com/32212:&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32212:&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;This change provides an HSM upcall facility to optionally bypass the
HSM coordinator (CDT) for archive and remove requests. (Release
already bypasses the CDT and restore bypass is not supported by this
change.)

Requires updated MDT and a worker client. OSTs, compute nodes, and
copytool nodes need not be updated.

lctl set_param mdt.*.hsm.upcall_mask=&apos;ARCHIVE&apos; # or &apos;ARCHIVE RESTORE&apos;, &apos;RESTORE&apos;, &apos;&apos;
lctl set_param mdt.*.hsm.upcall_path=/.../lhsm_mdt_upcall # Full path.

HSM requests whose action is set in the upcall_mask parameter will be
diverted from the coordinator and handled by the executable specified
by upcall_path. By default upcall_mask is empty which gives the normal
HSM coordinator handling behavior.

The upcall (to be supplied by the site) will be invoked by MDT RPC
handler (runs on MDT as a root privileged process with an empty
environment). Invocation will be of the form:

  /.../lhsm_mdt_upcall ACTION FSNAME ARCHIVE_ID FLAGS DATA FID...

with one or more FIDs each as a separate argument. The upcall_path
paramater can be set to the path of an arbitrary (site supplied)
executable as long as it DTRT. The RPC handler will block until the
upcall completes. So for safety/liveness the upcall should really not
access Lustre. Instead the upcall should put the request in an
off-Lustre persistent queue or database and then exit. The actions
could be submitted to a job scheduler but care must be taken to ensure
thatthis does not entail any Lustre operations. See comments in
mdt_hsm_upcall().

A separate process (called a &quot;worker&quot; and also to be supplied by the
site) should read from that persistent queue and perform the
actions. The worker process does what a copytool does but instead of
listening on a KUC pipe for actions it reads form the queue. Like
existing copytools it must interact with the Lustre and with the
archive. The main difference (on the Lustre side) is that it uses
slightly modified ioctls to handle the upcalled requests. To make it
easier I added a new command (&apos;lfs hsm_upcall&apos;) that manages the
Lustre half of an upcalled action and a sample script
lustre/utils/lhsm_worker_posix that handles the archive side (assuming
a lhsmtool_posix archive layout). The idea is that &apos;lfs hsm_upcall&apos;
knows about Lustre and lhsm_worker_posix knows about the
archive. Running

  lfs hsm_upcall lhsm_worker_posix ARCHIVE FSNAME ARCHIVE_ID FLAGS DATA FID...

will do the following for each FID:
  1. Open the Lustre file to be archived specified by FID.
  2. Send an RPC (which bypasses the CDT) to the MDT to say that ARCHIVE is starting.
  3. Invoke

       lhsm_worker_posix ACTION FSNAME ARCHIVE_ID FLAGS DATA FID

     with stdin opened to the file to be archived.
  4. Wait for lhsm_worker_posix and send a ARCHIVE completion RPC
     (with the exit status of lhsm_worker_posix to the MDT).
  5. Close the file to be archived.

Remove is handled similarly by without the open or close.

See comments in lustre/utils/lhsm_worker_posix and lfs_hsm_upcall().

This may seem like a lot of moving parts but internally HSM has a lot
of parts and this was the cleanest way to decompose it that would
offer the flexibility needed.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="52043">LU-10968</key>
            <summary>add coordinator bypass upcalls for HSM archive and remove</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="4" iconUrl="https://jira.whamcloud.com/images/icons/statuses/reopened.png" description="This issue was once resolved, but the resolution was deemed incorrect. From here issues are either marked assigned or resolved.">Reopened</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="nangelinas">Nikitas Angelinas</assignee>
                                    <reporter username="jhammond">John Hammond</reporter>
                        <labels>
                    </labels>
                <created>Mon, 30 Apr 2018 14:56:33 +0000</created>
                <updated>Tue, 22 Mar 2022 19:49:07 +0000</updated>
                                                                                <due></due>
                            <votes>1</votes>
                                    <watches>21</watches>
                                                                            <comments>
                            <comment id="226973" author="gerrit" created="Mon, 30 Apr 2018 20:34:56 +0000"  >&lt;p&gt;John L. Hammond (john.hammond@intel.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32212&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32212&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10968&quot; title=&quot;add coordinator bypass upcalls for HSM archive and remove&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10968&quot;&gt;LU-10968&lt;/a&gt; hsm: add archive and remove upcall handling 1&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 5006c5d0e7181685ab3926691fe08ed74fc0eb63&lt;/p&gt;</comment>
                            <comment id="233126" author="nrutman" created="Thu, 6 Sep 2018 15:45:49 +0000"  >&lt;p&gt;A couple of thoughts:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Does an upcall have a significant overhead for each hsm request? What kind of request rates can be sustained?&lt;/li&gt;
	&lt;li&gt;Restore is needed - is your comment &quot;not supported by this change&quot; a reflection of a plan, or merely a limitation of the current patch?&#160;&#160;&lt;/li&gt;
&lt;/ol&gt;
</comment>
                            <comment id="234031" author="nrutman" created="Wed, 26 Sep 2018 18:11:19 +0000"  >&lt;p&gt;Discussion from LAD18 Dev Day.&lt;/p&gt;

&lt;p&gt;#1. upcall vs netlink. (Cray/@BenEvans has a netlink prototype.)&lt;/p&gt;

&lt;p&gt;Upcalls make it easy to implement standalone userspace consumers. But require a relatively heavyweight process spawning for each request.&#160;&lt;/p&gt;

&lt;p&gt;#2. lfs hsm_upcall vs llapi_hsmaction_start|end&lt;/p&gt;

&lt;p&gt;@JohnHammond&apos;s lfs hsm_upcall takes as an argument a script to call back for each file. This again generates a process spawn/context switch, which @NathanRutman doesn&apos;t like. Instead, 2 llapi calls could be called directly by the copytool, one before (llapi_hsmaction_start) and one after (llapi_hsmaction_end) each file. These calls would do all the Lustre housekeeping bits: locks, layouts, ioctls. I think John agreed this is ok.&#160;&lt;/p&gt;

&lt;p&gt;#3. exporting restore vs retaining the coordinator&lt;/p&gt;

&lt;p&gt;John&apos;s patch as it stands only exports the archive and cancel, not restore. Restore retains it&apos;s current form, using the Coordinator and Lustre-registered copytools. The reason for this is for MDT failure: the MDT re-enqueues the layout lock for any incomplete restores, in order to prevent the clients from obtaining the layout lock when they reconnect. We discussed 3 options, all sadly uninformed by any actual knowledge of how this works right now.&lt;/p&gt;

&lt;p&gt;#3a. The MDS should not grant the layout lock to a client if the request comes with a &quot;restore intent&quot;, and should enqueue the lock itself instead. It is not clear that the client sends such an intent (does it?). The MDS can&apos;t block all layout access because we expect other things to break if we do this (such as?).&#160;&lt;/p&gt;

&lt;p&gt;#3b. Pin the client&apos;s RESTORE RPC for replay. When the MDS restarts, the client re-sends the RPC, which triggers the MDS to re-enqueue the layout lock. Problem is that the client is also replaying the layout lock request, and we&apos;re not sure which one is processed first, and also not sure this won&apos;t change in the future.&lt;/p&gt;

&lt;p&gt;#3c. The client should re-check that the layout is not released upon being granted the layout lock. If it is still released, then loop back to the send RESTORE RPC step. Nobody is quite sure how this check actually happens today, but theoretically if it happens on the client once, it should be possible to check again. Requires a bit of research as to how.&lt;/p&gt;

&lt;p&gt;Of these three options, it seems to @NathanRutman and @QuintenBouget that 3c. seems the most bulletproof.&#160;&lt;/p&gt;

&lt;p&gt;#4. Combining Ben&apos;s and John&apos;s efforts.&lt;/p&gt;

&lt;p&gt;In addition to #1 above, @BenEvans has a similar prototype patch of his own. Obviously we need to consolidate down to a single implementation. I suggest Ben attach his patch here as well, and everyone comment on Johns and Bens so we can figure out how to converge.&#160;&lt;/p&gt;

&lt;p&gt;#5. Just for fun&lt;/p&gt;

&lt;p&gt;Nathan brought up the idea that if archive was a real layout, it could be the second mirror, and we could possibly use the automatic reconstruction code being developed for EC layouts to populate the primary mirror. Not quite that easy, because the reconstruct happens on the client, which would then have to ask the MDS to restore. But John one-upped him by suggesting that the archive mirror could just be a single object on a specialized archive OST. This makes its HSM status irrelevant to client code; client just treats it as a one-stripe mirror. It could read from it directly. The archive OST would have upcalls to connect to copytools. (Hmm, maybe it is a regular ldiskfs OST, except that it has a filter upcall to locally restore any missing objects to a temporary disk file.)&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="234038" author="bevans" created="Wed, 26 Sep 2018 19:03:29 +0000"  >&lt;p&gt;3a doesn&apos;t work, as there may be multiple restore requests for the same file on multiple clients, the MDS needs to keep the lock for restore.&lt;/p&gt;

&lt;p&gt;3b looks workable, but any coordinator needs to be able to check for duplicate requests, and the MDS needs to take the layout lock only on the first restore request&lt;/p&gt;

&lt;p&gt;3c an implicit restore returns -ENODATA if the layout lock gets released and there is no data.&#160; You&apos;d have to retry the restore.&lt;/p&gt;</comment>
                            <comment id="234042" author="gerrit" created="Wed, 26 Sep 2018 20:39:40 +0000"  >&lt;p&gt;Ben Evans (bevans@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/33245&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/33245&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10968&quot; title=&quot;add coordinator bypass upcalls for HSM archive and remove&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10968&quot;&gt;LU-10968&lt;/a&gt; hsm: Create an external HSM coordinator&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: a6d3e5d7107e53f2e7da351bf1b91c5ea090b225&lt;/p&gt;</comment>
                            <comment id="234058" author="bougetq" created="Thu, 27 Sep 2018 11:00:32 +0000"  >&lt;p&gt;More on 3c:&lt;/p&gt;

&lt;p&gt;The code we care about is located in &lt;tt&gt;lustre/llite/vvp_io.c&lt;/tt&gt;::&lt;tt&gt;vvp_io_fini()&lt;/tt&gt;&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;        if (io-&amp;gt;ci_restore_needed) {
		/* file was detected release, we need to restore it
		 * before finishing the io
		 */
		rc = ll_layout_restore(inode, 0, OBD_OBJECT_EOF);
		/* if restore registration failed, no restart,
		 * we will return -ENODATA */
		/* The layout will change after restore, so we need to
		 * block on layout lock held by the MDT
		 * as MDT will not send new layout in lvb (see LU-3124)
		 * we have to explicitly fetch it, all this will be done
		 * by ll_layout_refresh().
		 * Even if ll_layout_restore() returns zero, it doesn&apos;t mean
		 * that restore has been successful. Therefore it sets
		 * ci_verify_layout so that it will check layout at the end
		 * of this function.
		 */
		if (rc) {
			io-&amp;gt;ci_restore_needed = 1;
			io-&amp;gt;ci_need_restart = 0;
			io-&amp;gt;ci_verify_layout = 0;
			io-&amp;gt;ci_result = rc;
			GOTO(out, rc);
		}

		io-&amp;gt;ci_restore_needed = 0;

		/* Even if ll_layout_restore() returns zero, it doesn&apos;t mean
		 * that restore has been successful. Therefore it should verify
		 * if there was layout change and restart I/O correspondingly.
		 */
		ll_layout_refresh(inode, &amp;amp;gen);
		io-&amp;gt;ci_need_restart = vio-&amp;gt;vui_layout_gen != gen;
		if (io-&amp;gt;ci_need_restart) {
			CDEBUG(D_VFSTRACE,
			       DFID&quot; layout changed from %d to %d.\n&quot;,
			       PFID(lu_object_fid(&amp;amp;obj-&amp;gt;co_lu)),
			       vio-&amp;gt;vui_layout_gen, gen);
			/* today successful restore is the only possible
			 * case */
			/* restore was done, clear restoring state */
			ll_file_clear_flag(ll_i2info(vvp_object_inode(obj)),
					   LLIF_FILE_RESTORING);
		}
		GOTO(out, 0);
	}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;tt&gt;ll_layout_restore()&lt;/tt&gt; sends the RESTORE RPC and &lt;tt&gt;ll_layout_refresh()&lt;/tt&gt; blocks waiting for the layout_lock to be released by the MDS.&lt;/p&gt;

&lt;p&gt;The comments in the code clearly suggest that the client has all the information it needs to detect that the RESTORE operation did not complete yet or that it failed (the generation number of the layout is not modified).&lt;/p&gt;

&lt;p&gt;IMO adding an &quot;&lt;tt&gt;else goto retry_restore;&lt;/tt&gt;&quot; to the last if block and a &quot;&lt;tt&gt;retry_restore&lt;/tt&gt;&quot; label above the call to &lt;tt&gt;ll_layout_restore()&lt;/tt&gt; should be enough to make the client resilient to MDS failovers without needing the MDS to store RESTORE requests in an llog.&lt;/p&gt;</comment>
                            <comment id="234195" author="nrutman" created="Tue, 2 Oct 2018 02:17:34 +0000"  >&lt;p&gt;My Gerrit/git skills are long gone, but I agree - something like this seems right. Only question is if we should give up after say 10 tries, and then explicitly return ENODATA.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
diff --git a/lustre/llite/vvp_io.c b/lustre/llite/vvp_io.c
index 793ec00de1..39e3d0ac29 100644
--- a/lustre/llite/vvp_io.c
+++ b/lustre/llite/vvp_io.c
@@ -311,7 +311,7 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; void vvp_io_fini(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env, &lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct cl_io_slice *ios)
 	       vio-&amp;gt;vui_layout_gen, io-&amp;gt;ci_need_write_intent,
 	       io-&amp;gt;ci_restore_needed);

-	&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (io-&amp;gt;ci_restore_needed) {
+	&lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; (io-&amp;gt;ci_restore_needed) {
 		/* file was detected release, we need to restore it
 		 * before finishing the io
 		 */
@@ -329,15 +329,12 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; void vvp_io_fini(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env, &lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct cl_io_slice *ios)
 		 * of &lt;span class=&quot;code-keyword&quot;&gt;this&lt;/span&gt; function.
 		 */
 		&lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc) {
-			io-&amp;gt;ci_restore_needed = 1;
 			io-&amp;gt;ci_need_restart = 0;
 			io-&amp;gt;ci_verify_layout = 0;
 			io-&amp;gt;ci_result = rc;
 			GOTO(out, rc);
 		}

-		io-&amp;gt;ci_restore_needed = 0;
-
 		/* Even &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; ll_layout_restore() returns zero, it doesn&apos;t mean
 		 * that restore has been successful. Therefore it should verify
 		 * &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; there was layout change and restart I/O correspondingly.
@@ -351,11 +348,13 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; void vvp_io_fini(&lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct lu_env *env, &lt;span class=&quot;code-keyword&quot;&gt;const&lt;/span&gt; struct cl_io_slice *ios)
 			       vio-&amp;gt;vui_layout_gen, gen);
 			/* today successful restore is the only possible
 			 * &lt;span class=&quot;code-keyword&quot;&gt;case&lt;/span&gt; */
+			io-&amp;gt;ci_restore_needed = 0;
+
 			&lt;span class=&quot;code-comment&quot;&gt;/* restore was done, clear restoring state */&lt;/span&gt;
 			ll_file_clear_flag(ll_i2info(vvp_object_inode(obj)),
 					   LLIF_FILE_RESTORING);
+			GOTO(out, 0);
 		}
-		GOTO(out, 0);
 	}

 	/**
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="234197" author="bougetq" created="Tue, 2 Oct 2018 06:55:19 +0000"  >&lt;p&gt;&amp;gt; Only question is if we should give up after say 10 tries&lt;/p&gt;

&lt;p&gt;I would say it is up to the MDS to mark the file as &quot;lost&quot; and maybe the decision should come from the copytool.&lt;/p&gt;</comment>
                            <comment id="234246" author="nrutman" created="Tue, 2 Oct 2018 21:29:59 +0000"  >&lt;p&gt;One more point to consider:&lt;/p&gt;

&lt;p&gt;#6. Copytool connection. My assumption was that externalizing the coordinator implied that we were no longer going to use Lustre&apos;s copytool registration and request assigning functions, but @BenEvans points out that is not mandatory. His patch actually leaves the copytool registration functionality in place (for those who want to use it - not mandatory), and thereby allows the external coordinator to pass the copytool requests back through Lustre. This allows for customizing the coordinator (e.g. reprioritize requests), without affecting anything else. (He also is providing a switch to select from internal to external coordinator.)&lt;/p&gt;</comment>
                            <comment id="234288" author="jhammond" created="Wed, 3 Oct 2018 16:18:55 +0000"  >&lt;p&gt;Nathan,&lt;/p&gt;

&lt;p&gt;Re your one line change to &lt;tt&gt;vvp_io_fini()&lt;/tt&gt;, I think there still needs to be some added check in the loop to see if the file layout is not-released. That&apos;s an issue that we have now. If the layout lock is granted but the file is still released then the client proceeds and the IO returns &lt;tt&gt;-ENODATA&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;Due to some questionable layering choices only the lov layer understands whether a layout is released or not. But only the llite layer can ask for it to be restored.&lt;/p&gt;</comment>
                            <comment id="234292" author="bevans" created="Wed, 3 Oct 2018 16:45:32 +0000"  >&lt;p&gt;I&apos;ve added the restore retry to my patch, also added a COORDINATOR variable to sanity-hsm, so we should be able to test external vs. internal coordinators with the flip of a switch.&#160; The environment variable mdt.*.hsm_control=external sets the coordinator to be external, rather than internal.&lt;/p&gt;

&lt;p&gt;Archive, restore, release, etc. all work either through lfs commands or through the implicit-restore path.&lt;/p&gt;</comment>
                            <comment id="234295" author="nrutman" created="Wed, 3 Oct 2018 17:23:23 +0000"  >&lt;blockquote&gt;&lt;p&gt;...&#160;still needs to be some added check in the loop to see if the file layout is not-released. That&apos;s an issue that we have now. If the layout lock is granted but the file is still released then the client proceeds and the IO returns&#160;&lt;tt&gt;-ENODATA&lt;/tt&gt;.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;If the file is not-released but MDS decides to release layout lock, wouldn&apos;t ENODATA be the correct response? Say archive file is &quot;lost&quot; - it remains released, and MDS should probably return an error in ll_layout_restore. I think Quentin is right - let the MDS decide, and let the client return an error rather than retry. Checking for &apos;released&apos; bit isn&apos;t necessarily the right thing for the client to do.&lt;/p&gt;</comment>
                            <comment id="241867" author="bevans" created="Wed, 13 Feb 2019 15:17:07 +0000"  >&lt;p&gt;I&apos;m actively working this, and will update my patch soon, so I&apos;m assigning it to myself.&lt;/p&gt;</comment>
                            <comment id="241870" author="simmonsja" created="Wed, 13 Feb 2019 15:35:36 +0000"  >&lt;p&gt;To let you know I&apos;m going to push another &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9680&quot; title=&quot;Improve the user land to kernel space interface for lustre&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9680&quot;&gt;LU-9680&lt;/a&gt; update. I have been talking to Amir about its application for LNet UDSP as well as using this for lnet selftest so I might move the netlink handling into liblnetconfig. I will rebase the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7659&quot; title=&quot;Replace KUC by more standard mechanisms&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7659&quot;&gt;LU-7659&lt;/a&gt; patch on top of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9680&quot; title=&quot;Improve the user land to kernel space interface for lustre&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9680&quot;&gt;LU-9680&lt;/a&gt; as well as push a early stats patch I developed which is not finished.&lt;/p&gt;</comment>
                            <comment id="255013" author="gerrit" created="Wed, 18 Sep 2019 21:27:04 +0000"  >&lt;p&gt;Ben Evans (bevans@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/36235&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36235&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10968&quot; title=&quot;add coordinator bypass upcalls for HSM archive and remove&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10968&quot;&gt;LU-10968&lt;/a&gt; hsm: create external HSM queue interface&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: ee8c7d8d925a3b21d29a5170524781ae9375618c&lt;/p&gt;</comment>
                            <comment id="256671" author="gerrit" created="Fri, 18 Oct 2019 19:02:31 +0000"  >&lt;p&gt;Ben Evans (bevans@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/36492&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/36492&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10968&quot; title=&quot;add coordinator bypass upcalls for HSM archive and remove&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10968&quot;&gt;LU-10968&lt;/a&gt; hsm: encapsulate copyaction_private&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: dfe56daacdb3cbf85b45eb2eccf98038a665c63c&lt;/p&gt;</comment>
                            <comment id="263533" author="spitzcor" created="Wed, 19 Feb 2020 02:53:15 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=beevans&quot; class=&quot;user-hover&quot; rel=&quot;beevans&quot;&gt;beevans&lt;/a&gt;, this is assigned to your old persona.&lt;/p&gt;</comment>
                            <comment id="304028" author="jhammond" created="Wed, 9 Jun 2021 17:30:58 +0000"  >&lt;p&gt;Closing since this isn&apos;t being worked on.&lt;/p&gt;</comment>
                            <comment id="304066" author="spitzcor" created="Wed, 9 Jun 2021 23:36:04 +0000"  >&lt;p&gt;There are still two patches pending in Gerrit for this ticket.  It is probably best not to abandon them.  Granted it isn&apos;t a true dependency, but we&apos;ve all been waiting for the netlink changes to finish refreshing and landing the HSM/data movement patches that would be impacted.  It looks like &lt;a href=&quot;https://review.whamcloud.com/#/c/34230&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/#/c/34230&lt;/a&gt; finally has all the necessary +1s and it is even in master-next now, so perhaps we can reopen and resume this work soon.&lt;/p&gt;</comment>
                            <comment id="329875" author="simmonsja" created="Tue, 22 Mar 2022 15:25:33 +0000"  >&lt;p&gt;Both HPE and Microsoft is interested in this work.&lt;/p&gt;</comment>
                            <comment id="329888" author="beevans" created="Tue, 22 Mar 2022 17:23:41 +0000"  >&lt;p&gt;I think 99% of this can be skipped by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13384&quot; title=&quot;HSM copytool API for external coordinator&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13384&quot;&gt;LU-13384&lt;/a&gt; and using purely non-kernel calls.&#160; lfs hsm ... calls just all get routed to the external coordinator (of whatever form).&lt;/p&gt;

&lt;p&gt;The only issue is the calls that perform imperative restore on file access.&#160; I believe those can be easily added using a smaller chunk of the infrastructure in this PR.&lt;/p&gt;</comment>
                            <comment id="329890" author="simmonsja" created="Tue, 22 Mar 2022 17:42:53 +0000"  >&lt;p&gt;Why even bother with kernel space at all then. If you want a pure user land solution then look at &lt;a href=&quot;http://github.com/cea-hpc/coordinatool.&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://github.com/cea-hpc/coordinatool.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This work is looking to improve what we are already using without creating a new interface.&lt;/p&gt;</comment>
                            <comment id="329906" author="beevans" created="Tue, 22 Mar 2022 19:49:07 +0000"  >&lt;p&gt;yep, given some improvements, coordinatool looks like a really good solution.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="28048">LU-6081</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="58485">LU-13384</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="37803">LU-8324</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="46759">LU-9680</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="34089">LU-7659</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzwj3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>