<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:40:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4215] Some expected improvements for OUT</title>
                <link>https://jira.whamcloud.com/browse/LU-4215</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;1. OUT RPC service threads on MDT and OST using different reply portals confused the OUT RPC user.&lt;/p&gt;

&lt;p&gt;On MDT-side, it is:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;                .psc_buf                = {
                        .bc_nbufs               = MDS_NBUFS,
                        .bc_buf_size            = OUT_BUFSIZE,
                        .bc_req_max_size        = OUT_MAXREQSIZE,
                        .bc_rep_max_size        = OUT_MAXREPSIZE,
                        .bc_req_portal          = OUT_PORTAL,
                        .bc_rep_portal          = MDC_REPLY_PORTAL,
                },
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;On OST-side, it is:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;                .psc_buf                = {
                        .bc_nbufs               = OST_NBUFS,
                        .bc_buf_size            = OUT_BUFSIZE,
                        .bc_req_max_size        = OUT_MAXREQSIZE,
                        .bc_rep_max_size        = OUT_MAXREPSIZE,
                        .bc_req_portal          = OUT_PORTAL,
                        .bc_rep_portal          = OSC_REPLY_PORTAL,
                },
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;For the case that both MDT and OST runs on the same physical server node (especially for VM environment testing), when OSP wants to talk with OST via OUT_PORTAL, the OUT RPC maybe handled by MDT-side OUT RPC service thread unexpected, and replied via MDC_REPLY_PORTAL, instead of OSC_REPLY_PORTAL on which the OSP is waiting for the reply. Then caused the OSP-side OUT RPC timeout and resend again and again.&lt;/p&gt;

&lt;p&gt;The bad case also can happen when OSP wants to talk with MDT via OUT_PORTAL.&lt;/p&gt;

&lt;p&gt;Because NDE I has already used the OUT RPC for talking among MDTs. To be compatible with the old version, we cannot change the MDT-side OUT RPC reply portal. So we have to chance OST-side OUT RPC reply portal to &quot;MDC_REPLY_PORTAL&quot;. But it is strange for OST-side to use MDT-side reply portal.&lt;/p&gt;

&lt;p&gt;2. The OUT RPC version is fixed on &quot;LUSTRE_MDS_VERSION&quot;, in spite of the RPC is to MDT or to OST. Also confused others. We can re-define &quot;tgt_out_handlers&quot;. But it may break the policy of Unified Target.&lt;/p&gt;


&lt;p&gt;3. Pack multiple idempotent sub-requests into single OUT RPC. In general, the OUT RPC should not assume that the sub-requests are related with each other. So even if one sub-request failed to be executed, the others should not be ignored. But in current implementation, it is not. If the other sub-requests are not related with the failed one, then such behavior is unexpected. Unfortunately, it is not easy to judge whether one sub-request is related with the others within current OUT request format, especially consider to be compatible with DNE I.&lt;/p&gt;

&lt;p&gt;4. Iteration via OUT. I found some client-side iteration framework in osp_md_object.c, but seems no server side handler. Do we have any plan to support that?&lt;/p&gt;</description>
                <environment></environment>
        <key id="21877">LU-4215</key>
            <summary>Some expected improvements for OUT</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="bzzz">Alex Zhuravlev</assignee>
                                    <reporter username="yong.fan">nasf</reporter>
                        <labels>
                            <label>LMR</label>
                    </labels>
                <created>Wed, 6 Nov 2013 02:59:55 +0000</created>
                <updated>Mon, 24 Jan 2022 20:44:57 +0000</updated>
                                            <version>Lustre 2.6.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="70823" author="adilger" created="Wed, 6 Nov 2013 08:00:23 +0000"  >&lt;p&gt;For #3 there is the idea of &quot;batchid&quot; in the OUT request structure:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;struct update {
        __u32           u_type;
        __u32           u_batchid;
        struct lu_fid   u_fid;
        __u32           u_lens[UPDATE_BUF_COUNT];
        __u32           u_bufs[0];
};
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This allows the batched request to combine multiple updates into a single transaction by using the same &quot;u_batchid&quot;, and updates with different &quot;u_batchid&quot; may be put into a separate transaction.&lt;/p&gt;

&lt;p&gt;I think that Di has some patch to change the OUT protocol a bit, though it doesn&apos;t really change the above semantic of using u_batchid to decide which updates belong in the same transaction.  I&apos;m not sure which patch of his this is, but it is intended to allow passing the master transno as the u_batchid.  This would make the OUT protocol incompatible with older servers, but since it is currently only used between the MDTs this shouldn&apos;t be a big problem (they would need to be updated at the same time anyway).&lt;/p&gt;</comment>
                            <comment id="70824" author="bzzz" created="Wed, 6 Nov 2013 08:06:03 +0000"  >&lt;p&gt;I also would like to make minor changes to the protocol:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;a flag saying whether to interrupt handling up on an error or continue (useful for batched destroys)&lt;/li&gt;
	&lt;li&gt;more compact attr_set format to let many uid/gid changes in a single RPC (&amp;gt;bytes per object at the moment with struct obdo)&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="78618" author="adilger" created="Thu, 6 Mar 2014 18:49:42 +0000"  >&lt;p&gt;Di, Alex, can you please comment whether this bug can be closed?  I think many of the improvements discussed here for the update RPC format were landed to master via &lt;a href=&quot;http://review.whamcloud.com/7128&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7128&lt;/a&gt; &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3539&quot; title=&quot;Change update RPC format&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3539&quot;&gt;&lt;del&gt;LU-3539&lt;/del&gt;&lt;/a&gt; protocol: Change UPDATE_OBJ RPC format&quot;.  The OUT portal was changed via &lt;a href=&quot;http://review.whamcloud.com/8390&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/8390&lt;/a&gt; &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3467&quot; title=&quot;Unified request handler on OST&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3467&quot;&gt;&lt;del&gt;LU-3467&lt;/del&gt;&lt;/a&gt; target: use osc_reply_portal for OUT services&quot;.&lt;/p&gt;

&lt;p&gt;Are there more changes that are still needed (which would be best to do in 2.6 while the protocol can be easily changed) or can it be closed?&lt;/p&gt;</comment>
                            <comment id="78630" author="di.wang" created="Thu, 6 Mar 2014 20:11:51 +0000"  >&lt;p&gt;Andreas, nasf, only 1 and 4 are resolved right now. 1 has been landed to master. and 4 will be resolved in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4690&quot; title=&quot;sanity test_4: Expect error removing in-use dir /mnt/lustre/remote_dir&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4690&quot;&gt;&lt;del&gt;LU-4690&lt;/del&gt;&lt;/a&gt;.(&lt;a href=&quot;http://review.whamcloud.com/#/c/9511/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/9511/&lt;/a&gt;).&lt;br/&gt;
I am not so sure about 3, for DNE, there are no strong requirement to tell which updates are failed on the client side, because right now, RPC are supposed to be executed in a single transaction, if 1 update is failed, then other updates inside the RPC are supposed to be undo right now.  And for readonly updates, there is only 1 update per RPC. Probably LFSCK have more requirements here?&lt;/p&gt;

&lt;p&gt;2 can be fixed in 2.6 definitely, IMHO.&lt;/p&gt;</comment>
                            <comment id="78660" author="yong.fan" created="Fri, 7 Mar 2014 02:37:40 +0000"  >&lt;p&gt;For requirement 3, if we want to support packaged attr_get/xattr_get on multiple OST-objects via single OUT RPC, then we need to continue the OUT RPC handling even if some sub-requests failed. For example, the MDT (via OSP) wants to attr_get/xattr_get on both OST-object1 and OST-object2 via single OUT RPC, and it does not whether the two targets exists or not; so on the OST-side, it should not skip the OST-object2 even if OST-object1 does not exist.&lt;/p&gt;

&lt;p&gt;Currently, because we do not support that well yet, the MDT (via OSP) only pack the sub-requests which belong to the same target object is the OUT RPC. Once we improvement it, we can consider to make the OUT RPC to be more efficient.&lt;/p&gt;</comment>
                            <comment id="78690" author="bzzz" created="Fri, 7 Mar 2014 11:19:05 +0000"  >&lt;p&gt;yes, the ability to continue processing in case of error is important for batched DESTROY&apos;s, for example. as for efficiency, I think we should not use obdo - it&apos;s huge, instead we should probably be able to get/set just a subset of attributes like dt_attr_set() allows.&lt;/p&gt;</comment>
                            <comment id="82515" author="adilger" created="Fri, 25 Apr 2014 17:54:05 +0000"  >&lt;p&gt;Alex, is there a chance for you to work on patches for 2.6 for the #2 and #3 items?  Di already has far too many 2.6 blocker bugs to work on this, so if we want these changes then you are the best candidate to do the work.&lt;/p&gt;</comment>
                            <comment id="82582" author="bzzz" created="Mon, 28 Apr 2014 05:46:55 +0000"  >&lt;p&gt;Andreas, yes. &lt;/p&gt;</comment>
                            <comment id="82727" author="bzzz" created="Tue, 29 Apr 2014 11:23:31 +0000"  >&lt;p&gt;Di, could you clarify on #2 a bit please?&lt;/p&gt;</comment>
                            <comment id="83321" author="di.wang" created="Tue, 6 May 2014 16:49:00 +0000"  >&lt;p&gt;Hmm, I think #2 means we also pack OUT RPC with LUSTRE_MDS_VERSION (see out_prep_update_req), no matter this OUT RPC will be sent to MDS or OST. Right now, DNE only send out RPC to another MDS, but for LFSCK, I assume some OUT RPC needs to be sent to OST. So I think this is the one needs to be fixed.  Though I guess the request is from LFSCK project, probably Fan Yong can confirm.&lt;/p&gt;</comment>
                            <comment id="83374" author="yong.fan" created="Wed, 7 May 2014 00:25:42 +0000"  >&lt;p&gt;Currently, LFSCK uses OUT RPC to talk with OST via OSP, it shares the interface out_prep_update_req() with the RPC to/from MDT. Inside such function, it always uses LUSTRE_MDS_VERSION in spite of whether it is for OST or MDT, which is confused.&lt;/p&gt;</comment>
                            <comment id="85280" author="adilger" created="Fri, 30 May 2014 17:46:16 +0000"  >&lt;p&gt;It seems #3 is the only item still outstanding. Is the code to handle batched requests working?&lt;/p&gt;</comment>
                            <comment id="85336" author="yong.fan" created="Fri, 30 May 2014 23:33:30 +0000"  >&lt;p&gt;The code for batched requests has worked since DNE 1. The trouble is that the handling for the batched requests within single OUT RPC will stop when it hits failure at some of the sub-request and the left sub-requests will be ignored even though they are not related with failed one. (that is the #3)&lt;/p&gt;</comment>
                            <comment id="95831" author="adilger" created="Tue, 7 Oct 2014 17:29:41 +0000"  >&lt;p&gt;Di, Nasf, what is the status on fixing this last issue?  What is the proposed solution?  Should the server mark all later batchids as failed, or should it try to execute them? What if they are dependent on each other?  Is there a flag that could be set on the batch that indicates if it should be executed even if the previous batch failed?&lt;/p&gt;</comment>
                            <comment id="95847" author="di.wang" created="Tue, 7 Oct 2014 18:03:50 +0000"  >&lt;p&gt;I just checked current master code, which seems not resolved yet, not sure in Nasf&apos;s patches. For DNE, it always fail immediately, which is good enough even for DNE2. For LFSCK, is this only for read-only updates like getattr?  Hmm, there is  padding in OSP update request&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;* Hold object_updates sending to the remote OUT in single RPC */
struct object_update_request {
        __u32                   ourq_magic;
        __u16                   ourq_count;     /* number of ourq_updates[] */
        __u16                   ourq_padding;
        struct object_update    ourq_updates[0];
};
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can add the flag there.&lt;/p&gt;</comment>
                            <comment id="95854" author="bzzz" created="Tue, 7 Oct 2014 18:22:51 +0000"  >&lt;p&gt;the ability to proceed is important for batched destroys.&lt;/p&gt;</comment>
                            <comment id="95914" author="yong.fan" created="Wed, 8 Oct 2014 09:26:05 +0000"  >&lt;p&gt;Because the original master did not support to execute other batchids after the former failed, the OSP (for LFSCK) only aggregates the sub-requests that operate on the same object in the same OUT RPC. So even thought without resolving the batchid issues, the LFSCK still works although it may be inefficient.&lt;/p&gt;</comment>
                            <comment id="103218" author="adilger" created="Mon, 12 Jan 2015 18:45:23 +0000"  >&lt;p&gt;This bug has been dropped from 2.7.0 because there hasn&apos;t been any progress on it in several months.  Is this going to cause major protocol incompatibility if this is fixed in 2.8.0?  If yes, is anyone able to fix the problems in the current code in the next week or so?&lt;/p&gt;</comment>
                            <comment id="103326" author="yong.fan" created="Tue, 13 Jan 2015 14:36:00 +0000"  >&lt;p&gt;The left issue is the #3, that is for performance improvement. It is essential for neither LFSCK nor DNE. I am not sure whether Alex or Di has made some patches on that. (I have NOT yet because of other LFSCK tickets). From the LFSCK view, it changed nothing about the OUT protocol. Even if someone will change the OUT protocol for #3 in the future, there will be no LFSCK special trouble.&lt;/p&gt;</comment>
                            <comment id="128891" author="bzzz" created="Wed, 30 Sep 2015 14:10:58 +0000"  >&lt;p&gt;this improvement is needed to shrink records going to ZIL. the patch mentioned in the bug shrink average record on MDT from 1541  to 407 bytes.&lt;/p&gt;</comment>
                            <comment id="128892" author="bzzz" created="Wed, 30 Sep 2015 14:12:02 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/15336/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/15336/&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10120">
                    <name>Blocker</name>
                                            <outwardlinks description="is blocking">
                                        <issuelink>
            <issuekey id="21123">LU-4009</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is blocked by">
                                        <issuelink>
            <issuekey id="32726">LU-7318</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="32729">LU-7319</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="19409">LU-3467</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="19635">LU-3539</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="23395">LU-4690</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="55661">LU-12310</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="33135">LU-7427</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzw84f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11467</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>