<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:30:51 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9964] &gt; 1 group lock on same file (group lock lifecycle/cbpending problem)</title>
                <link>https://jira.whamcloud.com/browse/LU-9964</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Sometimes when using group locks from many threads writing to one file, one of several assertions is encountered.  Note all of this assumes all of the lock requests are cooperating &amp;amp; using the same GID.&lt;/p&gt;

&lt;p&gt;From osc_cache_writeback_range:&lt;br/&gt;
LASSERT(hp == 0 &amp;amp;&amp;amp; discard == 0);&lt;br/&gt;
EASSERT(!ext-&amp;gt;oe_hp, ext);&lt;/p&gt;

&lt;p&gt;And osc_extent_merge:&lt;br/&gt;
LASSERT(cur-&amp;gt;oe_dlmlock == victim-&amp;gt;oe_dlmlock);&lt;/p&gt;

&lt;p&gt;Investigation of dumps shows that in all of these cases, multiple group locks are granted on the same resource at the same time, and one of these locks has cbpending set.  This is broadly similar to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6368&quot; title=&quot;ASSERTION( cur-&amp;gt;oe_dlmlock == victim-&amp;gt;oe_dlmlock ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6368&quot;&gt;&lt;del&gt;LU-6368&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6679&quot; title=&quot;ASSERTION( !ext-&amp;gt;oe_hp ) failed with group lock&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6679&quot;&gt;&lt;del&gt;LU-6679&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I believe there are actually two problems here, one in the request phase and one in the destruction phase.&lt;/p&gt;

&lt;p&gt;It is possible for two threads (on the same client) to request a group lock from the server at the same time.  If this happens, both group locks will be granted, because they are compatible with one another.  This gets two group locks granted at the same time on the same file.  When one of them is eventually released, this can cause the crashes noted above, because two locks cover the same dirty pages.&lt;/p&gt;

&lt;p&gt;Additionally, almost exactly the problem described in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6368&quot; title=&quot;ASSERTION( cur-&amp;gt;oe_dlmlock == victim-&amp;gt;oe_dlmlock ) failed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6368&quot;&gt;&lt;del&gt;LU-6368&lt;/del&gt;&lt;/a&gt; is still present.  When a group gets cbpending set, future group lock requests will fail to match it, which can result in the server granting a group lock which conflicts with an existing request.  While cbpending is no longer set in order to destroy a group lock, it is still &lt;b&gt;eventually&lt;/b&gt; set while destroying a group lock.  (ldlm_cli_cancel_local does it)&lt;/p&gt;

&lt;p&gt;After this point, new requests on the client will not match this lock any more.  That can result in new group lock requests to the server, again creating the overlapping lock problem.  This also results in the same crashes.&lt;/p&gt;

&lt;p&gt;The solution comes in two parts:&lt;br/&gt;
1. Wait (in osc_lock_enqueue_wait) for compatible group lock requests to be granted before attempting the ldlm phase of the lock request&lt;br/&gt;
2. Change the matching logic in ldlm_lock_match and lock_matches so that if we find a group lock being destroyed, we wait until it is fully destroyed before making a new lock request.&lt;/p&gt;</description>
                <environment></environment>
        <key id="48242">LU-9964</key>
            <summary>&gt; 1 group lock on same file (group lock lifecycle/cbpending problem)</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="paf">Patrick Farrell</assignee>
                                    <reporter username="paf">Patrick Farrell</reporter>
                        <labels>
                    </labels>
                <created>Fri, 8 Sep 2017 20:37:04 +0000</created>
                <updated>Fri, 3 Mar 2023 17:14:51 +0000</updated>
                            <resolved>Mon, 25 Nov 2019 18:09:13 +0000</resolved>
                                                    <fixVersion>Lustre 2.13.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>3</watches>
                                                                            <comments>
                            <comment id="207946" author="paf" created="Fri, 8 Sep 2017 20:41:53 +0000"  >&lt;p&gt;Attached files together comprise a test for the &quot;two group locks granted on same resource&quot; case.  They will NOT crash (because they do not write to the file), simply exit and dump debug when the case is identified.&lt;/p&gt;

&lt;p&gt;Compile the .c file (in a directory by itself) to a binary named a.out&lt;br/&gt;
Run test-9964.sh&lt;/p&gt;

&lt;p&gt;On a 4 CPU VM without my patch, I hit the problem in &amp;lt; 10 minutes.  On a real system with 32 CPUs, I hit the problem in &amp;lt; 1 minute.&lt;/p&gt;</comment>
                            <comment id="207947" author="paf" created="Fri, 8 Sep 2017 20:57:31 +0000"  >&lt;p&gt;Note that these problems exist for PW locks as well, but&lt;br/&gt;
are solved on the server, because the server will not grant&lt;br/&gt;
a new PW lock until all conflicting locks have been&lt;br/&gt;
cancelled.  The problem for group locks is they are&lt;br/&gt;
compatible with one another.  The problem is just that we &lt;br/&gt;
must not grant two group locks on the same resource to the&lt;br/&gt;
same client.&lt;/p&gt;

&lt;p&gt;We could achieve this by checking the exports before&lt;br/&gt;
granting new group locks, but since locks are not sorted by&lt;br/&gt;
export, this would require walking the list of granted and&lt;br/&gt;
waiting locks on the server.  If many clients request group&lt;br/&gt;
locks, this would be unacceptable.&lt;/p&gt;</comment>
                            <comment id="207948" author="gerrit" created="Fri, 8 Sep 2017 21:02:02 +0000"  >&lt;p&gt;&lt;del&gt;Patrick Farrell (paf@cray.com) uploaded a new patch:&lt;/del&gt; &lt;a href=&quot;https://review.whamcloud.com/28916&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/28916&lt;/a&gt;&lt;br/&gt;
&lt;del&gt;Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9964&quot; title=&quot;&amp;gt; 1 group lock on same file (group lock lifecycle/cbpending problem)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9964&quot;&gt;&lt;del&gt;LU-9964&lt;/del&gt;&lt;/a&gt; ldlm: Prevent multiple group locks&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Project: fs/lustre-release&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Branch: master&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Current Patch Set: 1&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Commit: 20bdf0a8071f2b4ed038bae76ac89797bb78c137&lt;/del&gt;&lt;/p&gt;</comment>
                            <comment id="253014" author="gerrit" created="Wed, 14 Aug 2019 09:24:27 +0000"  >&lt;p&gt;Alexandr Boyko (c17825@cray.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/35791&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/35791&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9964&quot; title=&quot;&amp;gt; 1 group lock on same file (group lock lifecycle/cbpending problem)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9964&quot;&gt;&lt;del&gt;LU-9964&lt;/del&gt;&lt;/a&gt; llite: prevent mulitple group locks&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 44bedeca8dce4db31b3f84480936e3b5a2a4ecc4&lt;/p&gt;</comment>
                            <comment id="254325" author="gerrit" created="Sat, 7 Sep 2019 02:07:47 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/35791/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/35791/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9964&quot; title=&quot;&amp;gt; 1 group lock on same file (group lock lifecycle/cbpending problem)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9964&quot;&gt;&lt;del&gt;LU-9964&lt;/del&gt;&lt;/a&gt; llite: prevent mulitple group locks&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: aba68250a67a10104c534bd726f67b31a7f35692&lt;/p&gt;</comment>
                            <comment id="258786" author="jgmitter" created="Mon, 25 Nov 2019 18:09:13 +0000"  >&lt;p&gt;Patch landed to 2.13.0&lt;/p&gt;</comment>
                            <comment id="290465" author="gerrit" created="Wed, 27 Jan 2021 18:32:57 +0000"  >&lt;p&gt;&lt;del&gt;Olaf Faaland-LLNL (faaland1@llnl.gov) uploaded a new patch:&lt;/del&gt; &lt;a href=&quot;https://review.whamcloud.com/41332&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/41332&lt;/a&gt;&lt;br/&gt;
&lt;del&gt;Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9964&quot; title=&quot;&amp;gt; 1 group lock on same file (group lock lifecycle/cbpending problem)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9964&quot;&gt;&lt;del&gt;LU-9964&lt;/del&gt;&lt;/a&gt; llite: prevent mulitple group locks&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Project: fs/lustre-release&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Branch: b2_12&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Current Patch Set: 1&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Commit: 5c84a2b80dbf7b993dbc33e32c539623c926e100&lt;/del&gt;&lt;/p&gt;</comment>
                            <comment id="364849" author="gerrit" created="Fri, 3 Mar 2023 17:14:51 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50198&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50198&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9964&quot; title=&quot;&amp;gt; 1 group lock on same file (group lock lifecycle/cbpending problem)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9964&quot;&gt;&lt;del&gt;LU-9964&lt;/del&gt;&lt;/a&gt; llite: prevent mulitple group locks&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 97945b29aabc8104f945e7769420789c2d40a70f&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="71489">LU-16046</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="28253" name="LU-9964.c" size="1339" author="paf" created="Fri, 8 Sep 2017 20:39:56 +0000"/>
                            <attachment id="28252" name="test-9964.sh" size="736" author="paf" created="Fri, 8 Sep 2017 20:39:56 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzjvr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>