<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:46:28 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-4857] (qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22</title>
                <link>https://jira.whamcloud.com/browse/LU-4857</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;hi,&lt;/p&gt;

&lt;p&gt;i&apos;m seeing following messages from 2.4.2 OSS every few minutes:&lt;/p&gt;

&lt;p&gt;2014-04-02T16:32:48+11:00 lemming17 kernel: LustreError: 17701:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:short-OST011f qtype:grp id:6644 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0&lt;br/&gt;
2014-04-02T16:32:48+11:00 lemming17 kernel: LustreError: 17701:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:short-OST011f qtype:grp id:6644 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0&lt;br/&gt;
2014-04-02T16:31:50+11:00 lemming27 kernel: LustreError: 21284:0:qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:short-OST0115 qtype:grp id:6644 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0&lt;br/&gt;
2014-04-02T16:31:50+11:00 lemming27 kernel: LustreError: 21284:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:short-OST0115 qtype:grp id:6644 enforced:1 granted:1048576 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0&lt;/p&gt;

&lt;p&gt;surprisingly, the errors are spewed only from two OSSes out of the lot and only for specific OSTs. &lt;/p&gt;

&lt;p&gt;at the same time MDS is throwing following:&lt;/p&gt;

&lt;p&gt;2014-04-02T16:32:36+11:00 gerbil5 kernel: LustreError: 17470:0:(qmt_handler.c:431:qmt_dqacq0()) $$$ Release too much! uuid:short-MDT0000-lwp-OST011f_UUID release:1048576 granted:0, total:354880716 qmt:short-QMT0000 pool:0-dt id:6644 enforced:1 hard:2516582400 soft:12582 91200 granted:354880716 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0&lt;br/&gt;
2014-04-02T16:32:36+11:00 gerbil5 kernel: LustreError: 17470:0:(qmt_handler.c:431:qmt_dqacq0()) $$$ Release too much! uuid:short-MDT000 0-lwp-OST011f_UUID release:1048576 granted:0, total:354880716 qmt:short-QMT0000 pool:0-dt id:6644 enforced:1 hard:2516582400 soft:12582 91200 granted:354880716 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0&lt;br/&gt;
2014-04-02T16:32:39+11:00 gerbil5 kernel: LustreError: 4733:0:(qmt_handler.c:431:qmt_dqacq0()) $$$ Release too much! uuid:short-MDT0000 -lwp-OST0115_UUID release:1048576 granted:0, total:354880716 qmt:short-QMT0000 pool:0-dt id:6644 enforced:1 hard:2516582400 soft:1258291200 granted:354880716 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0&lt;br/&gt;
2014-04-02T16:32:39+11:00 gerbil5 kernel: LustreError: 4733:0:(qmt_handler.c:431:qmt_dqacq0()) $$$ Release too much! uuid:short-MDT0000-lwp-OST0115_UUID release:1048576 granted:0, total:354880716 qmt:short-QMT0000 pool:0-dt id:6644 enforced:1 hard:2516582400 soft:1258291200 granted:354880716 time:0 qunit:1048576 edquot:0 may_rel:0 revoke:0&lt;/p&gt;

&lt;p&gt;these errors are seen after all the servers have been rebooted afresh as part of maintenance cycle (some other LBUGs were fixed).&lt;/p&gt;

&lt;p&gt;any pointers what could be causing it?&lt;/p&gt;</description>
                <environment>CentOS 6.4 / Kernel 2.6.32-358.18.1.el6_lustre.x86_64</environment>
        <key id="24045">LU-4857</key>
            <summary>(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="javed">javed shaikh</reporter>
                        <labels>
                    </labels>
                <created>Thu, 3 Apr 2014 06:51:14 +0000</created>
                <updated>Thu, 23 Jun 2016 08:16:02 +0000</updated>
                            <resolved>Thu, 23 Jun 2016 08:16:02 +0000</resolved>
                                    <version>Lustre 2.4.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="80922" author="niu" created="Thu, 3 Apr 2014 07:37:56 +0000"  >&lt;p&gt;This looks like the quota slave index files on master and slave are not synchronized somehow.&lt;/p&gt;

&lt;p&gt;1. Did the error message show up without a stop, or it disappeared after a while?&lt;br/&gt;
2. Check the quota_slave.info proc file for the OST with erros. (cat /proc/fs/lustre/osd-ldiskfs/$OSTNAME/quota_slave/info)&lt;br/&gt;
3. Try to trigger reintegration to see if it can solve the problem. (ehco 1 &amp;gt; /proc/fs/lustre/osd-ldiskfs/$OSTNAME/quota_slave/force_reint)&lt;/p&gt;

&lt;p&gt;If force reintegration can&apos;t resolve the problem, could you collect lustre log with D_QUOTA and D_TRACE enabled? (execute following instructions on the OSS to collect log)&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;lctl clear&lt;/li&gt;
	&lt;li&gt;lctl set_param debug=+trace; lctl set_param debug=+quota&lt;/li&gt;
	&lt;li&gt;lctl debug_deamon start $TMPFILE 500&lt;/li&gt;
	&lt;li&gt;lctl mark &quot;============= start reint =================&quot;&lt;/li&gt;
	&lt;li&gt;ehco 1 &amp;gt; /proc/fs/lustre/osd-ldiskfs/$OSTNAME/quota_slave/force_reint&lt;/li&gt;
	&lt;li&gt;wait for reintegration done. (when the &quot;user uptodate&quot; and &quot;group uptodate&quot; in /proc/fs/lustre/osd-ldiskfs/$OSTNAME/quota_slave/info changes to &quot;glb&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;,slv&lt;span class=&quot;error&quot;&gt;&amp;#91;1&amp;#93;&lt;/span&gt;,reint&lt;span class=&quot;error&quot;&gt;&amp;#91;0&amp;#93;&lt;/span&gt;&quot;)&lt;/li&gt;
	&lt;li&gt;lctl debug_daemon stop&lt;/li&gt;
	&lt;li&gt;lctl debug_file $TMPFILE $LOGFILE&lt;/li&gt;
	&lt;li&gt;attach log file in the ticket.&lt;/li&gt;
&lt;/ul&gt;
</comment>
                            <comment id="81012" author="javed" created="Fri, 4 Apr 2014 01:07:43 +0000"  >&lt;p&gt;1. error msgs rolled every 10mins&lt;br/&gt;
3. triggered reintergation and it seems to have stopped both OSS and MDS errors&lt;br/&gt;
2. this is after reintegration:&lt;/p&gt;
   &lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;   $ cat /proc/fs/lustre/osd-ldiskfs/&lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt;-OST0115/quota_slave/info
   target name:    &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt;-OST0115
   pool ID:        0
   type:           dt
   quota enabled:  g
   conn to master: setup
   space acct:     ug
   user uptodate:  glb[0],slv[0],reint[0]
   group uptodate: glb[1],slv[1],reint[0]
   &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;p&gt;i&apos;ll observe for a day and then get back. thanks.&lt;/p&gt;</comment>
                            <comment id="81105" author="javed" created="Mon, 7 Apr 2014 00:54:40 +0000"  >&lt;p&gt;update.&lt;br/&gt;
it did stop on the above OSS/OSTs. but started on other OSS/OSTs in last two days. forced reintegration stops it.&lt;br/&gt;
so guess, need to keep forcing as cases crop up.&lt;/p&gt;</comment>
                            <comment id="81106" author="javed" created="Mon, 7 Apr 2014 07:20:15 +0000"  >&lt;p&gt;this ticket can be closed. thanks.&lt;/p&gt;</comment>
                            <comment id="81146" author="niu" created="Tue, 8 Apr 2014 01:16:21 +0000"  >&lt;p&gt;Hi, Javed, is there any error messages in the MDT or OST before this start? I&apos;d like to see why the reintegration failed.&lt;/p&gt;</comment>
                            <comment id="81150" author="javed" created="Tue, 8 Apr 2014 02:40:37 +0000"  >&lt;p&gt;just checked for one of affected OSTs. no error msgs. only msgs related to client evictions, orphan obj deletions.&lt;/p&gt;</comment>
                            <comment id="81510" author="javed" created="Mon, 14 Apr 2014 04:30:21 +0000"  >&lt;p&gt;just got this from another OSS:&lt;/p&gt;

&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;2014-04-10T17:56:09+10:00 lemming2 kernel: Lustre: &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt;-OST008e: haven&lt;span class=&quot;code-quote&quot;&gt;&apos;t heard from client 4ab84a25-da95-49af-a512-e406eb4af84f (at 10.9.3.2@o2ib3) in 227 seconds. I think it&apos;&lt;/span&gt;s dead, and I am evicting it. exp ffff8820371ff400, cur 1397116569 expire 1397116419 last 1397116342
2014-04-10T17:56:09+10:00 lemming2 kernel: Lustre: Skipped 20 previous similar messages

2014-04-11T09:20:49+10:00 lemming2 kernel: LustreError: 25971:0:(ldlm_resource.c:1165:ldlm_resource_get()) &lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt;-OST008e: lvbo_init failed &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; resource 0xb65a43:0x0: rc = -2

2014-04-11T10:08:10+10:00 lemming2 kernel: LustreError: 25593:0:(qsd_handler.c:344:qsd_req_completion()) $$$ DQACQ failed with -22, flags:0x4 qsd:&lt;span class=&quot;code-object&quot;&gt;short&lt;/span&gt;-OST008e qtype:grp id:4093 enforced:1 granted:65536 pending:0 waiting:0 req:1 usage:0 qunit:0 qtune:0 edquot:0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;though there&apos;s one more OSS where the DQACQ error occurred on OST without prior errors on that OST.&lt;/p&gt;</comment>
                            <comment id="156640" author="niu" created="Thu, 23 Jun 2016 08:16:02 +0000"  >&lt;p&gt;dup of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6629&quot; title=&quot;sanity-benchmark test_bonnie: DQACQ failed with -22&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6629&quot;&gt;&lt;del&gt;LU-6629&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwj67:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>13395</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>