<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:17:55 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-15393] object allocation when OST is lost</title>
                <link>https://jira.whamcloud.com/browse/LU-15393</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Currently ltd_qos.lq_rw_sem is used at next LOD paths&lt;/p&gt;

&lt;p&gt;lod_qos_statfs_update() write - does not protect anything I hope it will gone with &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-14277&quot; title=&quot;any create blocked due any OST fail&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-14277&quot;&gt;&lt;del&gt;LU-14277&lt;/del&gt;&lt;/a&gt;&lt;br/&gt;
lod_qos_calc_rr() write - refill pool array if LQ_DIRTY was set, rare&lt;br/&gt;
lod_ost_alloc_rr() read - whole path for objects reservation&lt;br/&gt;
lod_mdt_alloc_rr() read - the same&lt;br/&gt;
lod_ost_alloc_qos() write - whole path for OST weight calculation and objects allocation&lt;br/&gt;
lod_mdt_alloc_qos() write - the same&lt;br/&gt;
lu_qos_add_tgt() write - adds a new target marks LQ_DIRTY, rare&lt;br/&gt;
lu_qos_del_tgt() write - dels a target, marks LQ_DIRTY, rare&lt;/p&gt;

&lt;p&gt;call graph for these functions&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;lod_qos_prep_create() {
        lod_qos_statfs_update()
        rc = lod_ost_alloc_qos()
        if (rc == -EAGAIN)
                rc = lod_ost_alloc_rr() {
                                lod_qos_calc_rr()
                                lod_check_and_reserve_ost() {
                                        lod_qos_declare_object_on()
                                }
                }
}
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lod_qos_declare_object_on() could block on object creation when OST was lost, failover or so. This leads that ltd_qos.lq_rw_sem would be hold&lt;br/&gt;
by lod_ost_alloc_rr() for read all failover time. This also means that other creation threads would stuck at&lt;br/&gt;
lod_ost_alloc_qos() on down_write(). No matter how many OSTs Lustre could use, all creation threads would hang in this case.&lt;/p&gt;

&lt;p&gt;I&apos;m suggesting a patch to unblock lod_ost_alloc_qos() threads with EAGAIN, it leads to lod_ost_alloc_rr() where semaphore is shared for read. So creation threads could take health OSTs and allocates objects.&lt;/p&gt;</description>
                <environment></environment>
        <key id="67722">LU-15393</key>
            <summary>object allocation when OST is lost</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="aboyko">Alexander Boyko</assignee>
                                    <reporter username="aboyko">Alexander Boyko</reporter>
                        <labels>
                            <label>patch</label>
                    </labels>
                <created>Wed, 22 Dec 2021 15:00:10 +0000</created>
                <updated>Sat, 20 May 2023 11:54:11 +0000</updated>
                            <resolved>Sat, 11 Jun 2022 15:34:56 +0000</resolved>
                                    <version>Lustre 2.15.0</version>
                                    <fixVersion>Lustre 2.16.0</fixVersion>
                    <fixVersion>Lustre 2.15.3</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="321364" author="gerrit" created="Wed, 22 Dec 2021 15:02:35 +0000"  >&lt;p&gt;&quot;Alexander Boyko &amp;lt;alexander.boyko@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/45921&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45921&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; lod: use killable semaphore for creation path&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: f787a6ba0432f91096d92a656e5265416df11692&lt;/p&gt;</comment>
                            <comment id="321365" author="bzzz" created="Wed, 22 Dec 2021 15:09:01 +0000"  >&lt;p&gt;normally we prefer RR policy and QoS is used when space usage is not well balanced among OST.&lt;/p&gt;</comment>
                            <comment id="321554" author="aboyko" created="Mon, 27 Dec 2021 06:52:56 +0000"  >&lt;p&gt;Well, QoS lost accuracy when Lustre starts to support more than one MDT. MDTs could create objects in parallel and doesn&apos;t know of each other. Actually every call of lod_qos_statfs_update()&#160; brings LQ_DIRTY flag with any load on OSTs (strict check).&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;                avail = OST_TGT(lod,idx)-&amp;gt;ltd_statfs.os_bavail;
                if (lod_statfs_and_check(env, lod, idx,
                                         &amp;amp;OST_TGT(lod, idx)-&amp;gt;ltd_statfs, 0))
                        continue;
                if (OST_TGT(lod,idx)-&amp;gt;ltd_statfs.os_bavail != avail)
                        /* recalculate weigths */
                        set_bit(LQ_DIRTY, &amp;amp;lod-&amp;gt;lod_qos.lq_flags);

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;With a new OSTs speeds over 60GB/s, 1minute write equal to 3.6TB of data.&#160; Default maxage is 5 seconds, 300GB.&#160; Actually QoS doesn&apos;t make any profit for balancing during load, it only slows objects allocation and burns CPU. I see it benefit only with slow periodic IO.&lt;/p&gt;</comment>
                            <comment id="322551" author="bzzz" created="Thu, 13 Jan 2022 11:46:36 +0000"  >&lt;p&gt;so just disable QoS on a specific setup and that&apos;s it?&lt;br/&gt;
or tune the threshold ..&lt;/p&gt;</comment>
                            <comment id="324562" author="gerrit" created="Mon, 31 Jan 2022 14:11:26 +0000"  >&lt;p&gt;&quot;Alexander Boyko &amp;lt;alexander.boyko@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/46388&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/46388&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; lod: skip qos for qos_threshold_rr=100&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 8a62f1e7aca81152248ca20caed3d893681dd3cf&lt;/p&gt;</comment>
                            <comment id="337401" author="gerrit" created="Sat, 11 Jun 2022 05:32:57 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/45921/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/45921/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; lod: use killable semaphore for creation path&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f46782b4c7dcaacd0046ebad3e3d84c2bb0367d4&lt;/p&gt;</comment>
                            <comment id="337412" author="gerrit" created="Sat, 11 Jun 2022 05:34:11 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/46388/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/46388/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; lod: skip qos for qos_threshold_rr=100&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 2f23140d5c1396fd0b247bd7f9c249f6e24096b7&lt;/p&gt;</comment>
                            <comment id="337498" author="pjones" created="Sat, 11 Jun 2022 15:34:56 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                            <comment id="338504" author="gerrit" created="Thu, 23 Jun 2022 13:55:06 +0000"  >&lt;p&gt;&quot;Alexander Boyko &amp;lt;alexander.boyko@hpe.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/47715&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47715&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; tests: check QoS hang with OST failover&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7d87322d92352865cc86438cba517d98aad0c789&lt;/p&gt;</comment>
                            <comment id="345306" author="gerrit" created="Thu, 1 Sep 2022 05:54:09 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/47715/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/47715/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; tests: check QoS hang with OST failover&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 52057d85eaef8c7b5262f0718629fabff919ff1d&lt;/p&gt;</comment>
                            <comment id="346793" author="adilger" created="Thu, 15 Sep 2022 15:07:34 +0000"  >&lt;p&gt;The recovery-small test_152 failed once:&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/2ac04215-a77d-4436-8b38-65a379dd5855&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/2ac04215-a77d-4436-8b38-65a379dd5855&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not sure if this is a problem yet. &lt;/p&gt;</comment>
                            <comment id="352500" author="gerrit" created="Thu, 10 Nov 2022 08:43:57 +0000"  >&lt;p&gt;&quot;Jian Yu &amp;lt;yujian@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49095&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49095&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; lod: use killable semaphore for creation path&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: eaf1700f3d57ae88b48099611219ea6f3d2de75f&lt;/p&gt;</comment>
                            <comment id="352502" author="gerrit" created="Thu, 10 Nov 2022 08:47:23 +0000"  >&lt;p&gt;&quot;Jian Yu &amp;lt;yujian@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49096&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49096&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; lod: skip qos for qos_threshold_rr=100&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 17b646aac70cf702d1358e65bf8ce22f16f41dfd&lt;/p&gt;</comment>
                            <comment id="352504" author="gerrit" created="Thu, 10 Nov 2022 08:51:18 +0000"  >&lt;p&gt;&quot;Jian Yu &amp;lt;yujian@whamcloud.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49097&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49097&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; tests: check QoS hang with OST failover&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 1a44918703ab0f75c3ee7ab45bf9d6db7c1a6674&lt;/p&gt;</comment>
                            <comment id="365213" author="gerrit" created="Wed, 8 Mar 2023 06:43:31 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49095/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49095/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; lod: use killable semaphore for creation path&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 18c098261104fef9350e932d124d78296b0cc135&lt;/p&gt;</comment>
                            <comment id="365214" author="gerrit" created="Wed, 8 Mar 2023 06:43:38 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49096/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49096/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; lod: skip qos for qos_threshold_rr=100&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 0b1aa418ac26d879d4794db1aab360a2230c891d&lt;/p&gt;</comment>
                            <comment id="365215" author="gerrit" created="Wed, 8 Mar 2023 06:43:50 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/49097/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/49097/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-15393&quot; title=&quot;object allocation when OST is lost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-15393&quot;&gt;&lt;del&gt;LU-15393&lt;/del&gt;&lt;/a&gt; tests: check QoS hang with OST failover&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 3692450355585c1a3a8502ce0f96a36650941f96&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="57632">LU-13073</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="62153">LU-14277</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i02d5z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>