<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:16:38 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1438] quota_chk_acq_common() still haven&apos;t managed to acquire quota</title>
                <link>https://jira.whamcloud.com/browse/LU-1438</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;we are getting some of quota related problem. the quota feature is enabled on the filesystem and the customer changed group name to the many big files, then ran lfs quotacheck command.&lt;br/&gt;
After that, even the group didn&apos;t exceed the quota limitation, they got the disk quota exceeded messages.&lt;/p&gt;

&lt;p&gt;OSS/MDS side, the following messages showed up since changed group name.&lt;br/&gt;
(quota_interface.c:473:quota_chk_acq_common()) still haven&apos;t managed to&lt;br/&gt;
acquire quota space from the quota master after 20 retries (err=0, rc=0)&lt;/p&gt;

&lt;p&gt;It seems to be close to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-428&quot; title=&quot;Lustre: 16290:0:(quota_interface.c:460:quota_chk_acq_common()) still haven&amp;#39;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-428&quot;&gt;&lt;del&gt;LU-428&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</description>
                <environment>lustre-1.8.7-wc1, RHEL5.7 for servers, RHEL6.2 for clients</environment>
        <key id="14552">LU-1438</key>
            <summary>quota_chk_acq_common() still haven&apos;t managed to acquire quota</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="ihara">Shuichi Ihara</reporter>
                        <labels>
                    </labels>
                <created>Thu, 24 May 2012 05:54:56 +0000</created>
                <updated>Fri, 22 Feb 2013 11:14:38 +0000</updated>
                            <resolved>Sat, 18 Aug 2012 10:34:38 +0000</resolved>
                                    <version>Lustre 1.8.7</version>
                                    <fixVersion>Lustre 2.3.0</fixVersion>
                    <fixVersion>Lustre 2.1.4</fixVersion>
                    <fixVersion>Lustre 1.8.9</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="39324" author="pjones" created="Thu, 24 May 2012 09:14:47 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please comment on this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="39376" author="niu" created="Fri, 25 May 2012 02:13:09 +0000"  >&lt;p&gt;Yes, it looks similar to the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-428&quot; title=&quot;Lustre: 16290:0:(quota_interface.c:460:quota_chk_acq_common()) still haven&amp;#39;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-428&quot;&gt;&lt;del&gt;LU-428&lt;/del&gt;&lt;/a&gt;, but we can&apos;t reproduced it in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-428&quot; title=&quot;Lustre: 16290:0:(quota_interface.c:460:quota_chk_acq_common()) still haven&amp;#39;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-428&quot;&gt;&lt;del&gt;LU-428&lt;/del&gt;&lt;/a&gt; unfortunately.&lt;/p&gt;

&lt;p&gt;Ihara, if it&apos;s reproduceable, could you try to collect the debug log on both MDS &amp;amp; OST with D_QUOTA enabled? Thanks.&lt;/p&gt;</comment>
                            <comment id="39390" author="niu" created="Fri, 25 May 2012 06:04:16 +0000"  >&lt;p&gt;Did customer run &apos;lfs setquota&apos; to clear limit for some user/group when they see this problem? Is it easy to be reproduced? I found a race condition in the code which could lead to such situation, but I&apos;m not sure if it&apos;s the real root cause. Anyway, I&apos;ll post a patch to fix that race.&lt;/p&gt;</comment>
                            <comment id="39391" author="ihara" created="Fri, 25 May 2012 06:31:39 +0000"  >&lt;p&gt;Niu,&lt;br/&gt;
Let me ask the customer, even if they clear the quota by &quot;lfs set quota&quot;, they see same issue.&lt;br/&gt;
should we test this first without debug?&lt;/p&gt;
</comment>
                            <comment id="39399" author="niu" created="Fri, 25 May 2012 08:31:36 +0000"  >&lt;p&gt;Hi, Ihara&lt;/p&gt;

&lt;p&gt;The race I mentioned is hard to be triggerred, so we&apos;d better get the debug log if it can be reproduced.&lt;/p&gt;</comment>
                            <comment id="39456" author="niu" created="Mon, 28 May 2012 08:15:45 +0000"  >&lt;p&gt;The patch to fix race in quota_chk_acq_common(): &lt;a href=&quot;http://review.whamcloud.com/#change,2927&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,2927&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="39736" author="ihara" created="Thu, 31 May 2012 12:30:30 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;I upaded the debug information when debug flags turned on. &lt;br/&gt;
please check on /uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1438&quot; title=&quot;quota_chk_acq_common() still haven&amp;#39;t managed to acquire quota&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1438&quot;&gt;&lt;del&gt;LU-1438&lt;/del&gt;&lt;/a&gt;/20120529.tar.bz2 on the whamcloud&apos;s ftp site.&lt;/p&gt;

&lt;p&gt;Thanks&lt;br/&gt;
Ihara&lt;/p&gt;</comment>
                            <comment id="39783" author="niu" created="Thu, 31 May 2012 23:43:05 +0000"  >&lt;p&gt;Thank you, Ihara. There are lots of &quot;still haven&apos;t managed to acquire quota..&quot; in the messages, however, seems the debug log was truncated, I can&apos;t find any of them in the debug log.&lt;/p&gt;

&lt;p&gt;Could you try the patch above to see if it resolves cursomer&apos;s problem?&lt;/p&gt;</comment>
                            <comment id="39784" author="ihara" created="Fri, 1 Jun 2012 00:11:42 +0000"  >&lt;p&gt;Niu,&lt;br/&gt;
ok, let me try patch to see if the problem is fixed. &lt;br/&gt;
Is this needed to apply to both servers and clients?&lt;/p&gt;</comment>
                            <comment id="39786" author="niu" created="Fri, 1 Jun 2012 00:18:30 +0000"  >&lt;p&gt;This is a server patch, you can apply it on server only.&lt;/p&gt;</comment>
                            <comment id="39788" author="pjones" created="Fri, 1 Jun 2012 01:08:15 +0000"  >&lt;p&gt;Is this fix needed on master too?&lt;/p&gt;</comment>
                            <comment id="39789" author="niu" created="Fri, 1 Jun 2012 01:39:04 +0000"  >&lt;p&gt;Hi, Peter, yes, I&apos;ll cook a patch for master.&lt;/p&gt;</comment>
                            <comment id="39791" author="niu" created="Fri, 1 Jun 2012 02:30:54 +0000"  >&lt;p&gt;patch for master: &lt;a href=&quot;http://review.whamcloud.com/#change,2996&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,2996&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="39796" author="ihara" created="Fri, 1 Jun 2012 04:33:21 +0000"  >&lt;p&gt;Niu,&lt;br/&gt;
Thanks, we are applying patches.&lt;br/&gt;
what exactly problem is? what&apos;s situation why we enter infinite loop?&lt;/p&gt;</comment>
                            <comment id="39797" author="niu" created="Fri, 1 Jun 2012 05:12:53 +0000"  >&lt;p&gt;This is a race in the quota code, which could be triggered when user execute &apos;lfs setquota&apos; to clear limit for some user/group on an active system (quota enabled, and there is ongoing io).&lt;/p&gt;</comment>
                            <comment id="39857" author="ihara" created="Fri, 1 Jun 2012 22:00:43 +0000"  >&lt;p&gt;The customer did &quot;lfs setquota&quot;, but it was not to clear. they changed quota limit (exceed original quota limit, then up the quota size) for some of user/group. After that, we have been seeing the these messages.&lt;br/&gt;
So, we didn&apos;t clear quota, but during the changing operation for quota size, the quota size is cleared once, then set new quota size?&lt;/p&gt;</comment>
                            <comment id="39881" author="niu" created="Sun, 3 Jun 2012 21:48:07 +0000"  >&lt;p&gt;The limit size changing will not cause clear -&amp;gt; set operations. So it maybe not caused by this race, anyway, let&apos;s see the result frist.&lt;/p&gt;</comment>
                            <comment id="39892" author="ihara" created="Mon, 4 Jun 2012 02:38:35 +0000"  >&lt;p&gt;so, what do you excactly mean &quot;lfs setquota&quot; to clear, is this setting to 0 for all quota limits for user/group?&lt;/p&gt;</comment>
                            <comment id="39993" author="ihara" created="Tue, 5 Jun 2012 02:31:32 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;We only applied patches to an OSS of all of them, but it didn&apos;t help. still messages are showing up.&lt;/p&gt;

&lt;p&gt;Please investigate further and let me if you need more information.  &lt;/p&gt;</comment>
                            <comment id="39994" author="ihara" created="Tue, 5 Jun 2012 02:36:01 +0000"  >&lt;p&gt;uploaded patch applied OSS&apos;s /var/log/messages on ftp site. /uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1438&quot; title=&quot;quota_chk_acq_common() still haven&amp;#39;t managed to acquire quota&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1438&quot;&gt;&lt;del&gt;LU-1438&lt;/del&gt;&lt;/a&gt;/messages.nos14.tgz&lt;br/&gt;
Please have a look at it.&lt;/p&gt;</comment>
                            <comment id="39997" author="niu" created="Tue, 5 Jun 2012 03:10:23 +0000"  >&lt;p&gt;Hi, Ihara&lt;/p&gt;

&lt;p&gt;This time the warnning message is little bit different:&lt;/p&gt;

&lt;p&gt;Jun  5 13:27:30 nos141i kernel: Lustre: 28834:0:(quota_interface.c:473:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 51 retries (err=0, rc=-5)&lt;/p&gt;

&lt;p&gt;The return value is -5 now, which means some error happened during ost acquiring quota from mds. In the previous messages, the rc is always 0.&lt;/p&gt;

&lt;p&gt;Can this OSS work properly even without quota enabled?&lt;/p&gt;</comment>
                            <comment id="40002" author="ihara" created="Tue, 5 Jun 2012 04:31:01 +0000"  >&lt;p&gt;Niu, &lt;/p&gt;

&lt;p&gt;Yes, quota has been turned off on all servers, but still showing this messages on the OSS.&lt;/p&gt;

&lt;p&gt;btw, we only applied patch to an OSS, but not to MDS yet.&lt;/p&gt;</comment>
                            <comment id="40066" author="ihara" created="Tue, 5 Jun 2012 22:44:24 +0000"  >&lt;p&gt;Niu, &lt;/p&gt;

&lt;p&gt;the following error messages is showing up on MDS forever..&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;LustreError: 9831:0:(ldlm_lib.c:2123:target_handle_dqacq_callback()) dqacq failed! (rc:-5)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;posted MDS&apos;s /var/log/messages as /uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1438&quot; title=&quot;quota_chk_acq_common() still haven&amp;#39;t managed to acquire quota&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1438&quot;&gt;&lt;del&gt;LU-1438&lt;/del&gt;&lt;/a&gt;/nmd05.tgz&lt;/p&gt;</comment>
                            <comment id="40067" author="niu" created="Tue, 5 Jun 2012 22:51:53 +0000"  >&lt;p&gt;hmm, I&apos;m afraid that the quota isn&apos;t turned off on the OSS, could you check the /proc/fs/lustre/obdfilter/$OSTNAME/quota_type to see if it&apos;s really turned off?&lt;/p&gt;

&lt;p&gt;Apply the patch to one OSS is fine.&lt;/p&gt;</comment>
                            <comment id="40068" author="ihara" created="Tue, 5 Jun 2012 23:26:42 +0000"  >&lt;p&gt;some storage here.. it indicates &quot;quota is disabled&quot; on the client, and nothing &quot;u&quot; and &quot;g&quot; on MDS and OSS, but quota_type is set to 1 on OSSs, but MDS&apos;s quota_type is 2.&lt;/p&gt;

&lt;p&gt;&amp;#35; lfs quota -u -t /nshare1&lt;br/&gt;
user quotas are not enabled.&lt;br/&gt;
&amp;#35; lfs quota -g -t /nshare1&lt;br/&gt;
group quotas are not enabled.&lt;/p&gt;

&lt;p&gt;on OSSs&lt;br/&gt;
&amp;#35; lctl get_param obdfilter.*.quota_type&lt;br/&gt;
obdfilter.nshare1-OST0015.quota_type=1&lt;br/&gt;
obdfilter.nshare1-OST0016.quota_type=1&lt;br/&gt;
obdfilter.nshare1-OST0017.quota_type=1&lt;br/&gt;
obdfilter.nshare1-OST0018.quota_type=1&lt;br/&gt;
obdfilter.nshare1-OST0019.quota_type=1&lt;br/&gt;
obdfilter.nshare1-OST001a.quota_type=1&lt;br/&gt;
obdfilter.nshare1-OST001b.quota_type=1&lt;/p&gt;

&lt;p&gt;on MDS&lt;br/&gt;
&amp;#35; cat /proc/fs/lustre/lquota/nshare3-MDT0000/quota_type &lt;br/&gt;
2&lt;/p&gt;</comment>
                            <comment id="40121" author="ihara" created="Wed, 6 Jun 2012 10:17:16 +0000"  >&lt;p&gt;Niu, &lt;/p&gt;

&lt;p&gt;any advices? it still can&apos;t stop error messages on MDS.. the customer is nervous about this..&lt;/p&gt;

&lt;p&gt;Ihara&lt;/p&gt;</comment>
                            <comment id="40169" author="niu" created="Wed, 6 Jun 2012 22:34:10 +0000"  >&lt;p&gt;Hi, Ihara&lt;/p&gt;

&lt;p&gt;The &apos;quota_type&apos; on both MDS &amp;amp; OSSs are looks good, seems no user or group quota is enabled. I don&apos;t where the dqacq requests come from, I think we have to enable D_QUOTA on both MDS &amp;amp; OSSs, and collect some debug log for analysis. Thanks.&lt;/p&gt;</comment>
                            <comment id="40182" author="ihara" created="Thu, 7 Jun 2012 08:15:03 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;we collected debug logs on MDS and two OSSs and filed them on /uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1438&quot; title=&quot;quota_chk_acq_common() still haven&amp;#39;t managed to acquire quota&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1438&quot;&gt;&lt;del&gt;LU-1438&lt;/del&gt;&lt;/a&gt;/2012-06-07/&lt;br/&gt;
We had been getting debug log until the following error messages on OSS.&lt;br/&gt;
kernel: Lustre: 23619:0:(quota_interface.c:473:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 78 retries (err=67, rc=-5)&lt;/p&gt;

&lt;p&gt;Please investigate futher.&lt;/p&gt;

&lt;p&gt;Thank you very much!&lt;/p&gt;</comment>
                            <comment id="40247" author="niu" created="Fri, 8 Jun 2012 00:45:22 +0000"  >&lt;p&gt;Thank you, Ihara. Seems it&apos;s a race of turn off quota vs. acquire quota, I will post a patch soon.&lt;/p&gt;</comment>
                            <comment id="40249" author="niu" created="Fri, 8 Jun 2012 01:04:44 +0000"  >&lt;p&gt;patch for b18: &lt;a href=&quot;http://review.whamcloud.com/3060&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3060&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="40251" author="ihara" created="Fri, 8 Jun 2012 01:21:51 +0000"  >&lt;p&gt;Thank you very much for this patch and we will try this patch as well.&lt;br/&gt;
This problem is that the quota slave is infinitely acquiring quota even quota is turned off on master. Because there is missing of check if quota is enabled, is it correct?&lt;br/&gt;
So, we should apply this patch on both MDS and OSS, right?&lt;/p&gt;</comment>
                            <comment id="40252" author="niu" created="Fri, 8 Jun 2012 01:31:40 +0000"  >&lt;p&gt;yes, exactly. The change is only for slave, but it should be applied on both MDS &amp;amp; OSS, since MDS has a slave running on it as well.&lt;/p&gt;</comment>
                            <comment id="40255" author="niu" created="Fri, 8 Jun 2012 04:37:58 +0000"  >&lt;p&gt;Hi, Ihara&lt;/p&gt;

&lt;p&gt;After looking closer to the log, I realized that your case is not caused by the race, it probably caused by that when you disable quota, some OST is offline (or the clear quota command to some OST was failed for some reason), so the local fs quota isn&apos;t disabled on some OST.&lt;/p&gt;

&lt;p&gt;To resolve the problem, you can return the &apos;lfs quotaoff&apos; to see if it&apos;s helpful. Thanks.&lt;/p&gt;</comment>
                            <comment id="40256" author="ihara" created="Fri, 8 Jun 2012 04:59:42 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;Thanks for updates. the original problem was that some groups exceeded quota despite these groups didn&apos;t achieve quota limit. what do you think of this problem?&lt;/p&gt;

&lt;p&gt;The following messages showed up after disabled quota, but the primary problem was quota limit didn&apos;t work well.&lt;br/&gt;
kernel: Lustre: 23619:0:(quota_interface.c:473:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 78 retries (err=67, rc=-5)&lt;/p&gt;</comment>
                            <comment id="40257" author="niu" created="Fri, 8 Jun 2012 06:00:20 +0000"  >&lt;blockquote&gt;
&lt;p&gt;the original problem was that some groups exceeded quota despite these groups didn&apos;t achieve quota limit. what do you think of this problem?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The original problem is that slave can&apos;t acquire quota from master for that group id, so it exceeded quota. My first patch might be helpful for this. You can also ask user to reset the quota limit for the problematic group id by &apos;lfs setquota 0&apos; -&amp;gt; &apos;lfs setquota xxx&apos; to see if it&apos;s helpful.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The following messages showed up after disabled quota, but the primary problem was quota limit didn&apos;t work well.&lt;br/&gt;
kernel: Lustre: 23619:0:(quota_interface.c:473:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 78 retries (err=67, rc=-5)&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The second problem might be caused by the inconsistence of on/off information between slave and master, you could ask user to rerun the &apos;lfs quotaoff&apos; to see if it works.&lt;/p&gt;

&lt;p&gt;Anyway, the quota design (in master &amp;amp; b18) requires all slaves online while setting quota or turn on/off quota, if the setquota or enable/disable quota failed on some slave (OST), there will be trouble, you have to rerun the command to make sure the new setting is synchronized on each slave(OST). Such kind of design drawback has been addressed in the new quota design (in orion).&lt;/p&gt;</comment>
                            <comment id="40456" author="ihara" created="Tue, 12 Jun 2012 22:39:12 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;It seems that the patches didn&apos;t help fixing problem.. &lt;br/&gt;
we stopped all OSSs and MDSs and restarted the Lustre with applied RPMs.&lt;br/&gt;
THe applied patches are below.&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,3060&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3060&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/#change,2927&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,2927&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, a user got &quot;quota exceeded messages&quot; even he didn&apos;t get quota limit.&lt;br/&gt;
At that time, we got the following messages on the OSS.&lt;/p&gt;

&lt;p&gt;Jun 12 18:19:36 nos131i kernel: Lustre: 29689:0:(quota_interface.c:481:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)&lt;br/&gt;
Jun 12 18:19:36 nos131i kernel: Lustre: 29689:0:(quota_interface.c:481:quota_chk_acq_common()) Skipped 10 previous similar messages&lt;/p&gt;

&lt;p&gt;And not same time, but we are seeing a lot of same messages on the even another OSSs.&lt;/p&gt;

&lt;p&gt;The resent logfiles are filed on /uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1438&quot; title=&quot;quota_chk_acq_common() still haven&amp;#39;t managed to acquire quota&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1438&quot;&gt;&lt;del&gt;LU-1438&lt;/del&gt;&lt;/a&gt;/20120613.&lt;/p&gt;

&lt;p&gt;Please investigate further. thank you very much again.&lt;/p&gt;</comment>
                            <comment id="40458" author="niu" created="Tue, 12 Jun 2012 22:58:05 +0000"  >&lt;p&gt;patch for master: &lt;a href=&quot;http://review.whamcloud.com/#change,3097&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#change,3097&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="40460" author="niu" created="Tue, 12 Jun 2012 22:59:36 +0000"  >&lt;p&gt;Hi, Ihara&lt;/p&gt;

&lt;p&gt;Did you try reset quota by &apos;lfs setquota&apos; for the user?&lt;/p&gt;</comment>
                            <comment id="40461" author="ihara" created="Tue, 12 Jun 2012 23:21:19 +0000"  >&lt;p&gt;You mean set zero to all (e.g. &quot;lfs setquota -u &amp;lt;user&amp;gt; -b 0 -B 0 -i 0 -I 0 &amp;lt;fsname&amp;gt;&quot;) to clear, then set real quota size again?&lt;/p&gt;</comment>
                            <comment id="40468" author="niu" created="Wed, 13 Jun 2012 02:25:37 +0000"  >&lt;p&gt;Yes, for the user who has quota limit, reset quota by &quot;lfs setquota -b 0 -B 0 -i 0 -I 0 $fsname&quot; -&amp;gt; &quot;lfs setquota -b limit ...&quot;, for the user who has no quota limit but got the error message (still haven&apos;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)) on OSS, re-clear the quota limit by &apos;lfs setquota -b limit ...&quot; -&amp;gt; &apos;lfs setquota -b 0 ...&quot;.&lt;/p&gt;

&lt;p&gt;There isn&apos;t much information in the dmesg, could you explain current status of the system? is user or group quota enabled? what&apos;s the output of &apos;lfs quota -v -u $user $fs&quot; for the user who has no quota limit but got error message on OSS? Thanks.&lt;/p&gt;</comment>
                            <comment id="40509" author="ihara" created="Wed, 13 Jun 2012 10:41:03 +0000"  >&lt;p&gt;Niu,&lt;/p&gt;

&lt;p&gt;The customer did &quot;lfs setquota&quot; to clear the all quota size to some users.&lt;br/&gt;
After that, as far as we can see, messages &quot;(quota_interface.c:481:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)&quot; are gone so far.&lt;/p&gt;

&lt;p&gt;As you mentioned, there was an inconsistency between master and slave - user is not exceeded to the global limit, but some local quota were exceeded the limit. Once the local quota on all MDS/OSSs are cleared by &quot;lfs setquota -b 0 -B 0...&quot;, then set quota again, now the quota works normally. Is this what you said a root cause and solution to fix current situation, right?&lt;/p&gt;

&lt;p&gt;So, for change the quota size, &quot;1. clear quota size, 2. set quota size&quot; is better and safe solution rather than just change the quota size by &quot;lfs setquota -b X&quot;?&lt;/p&gt;

&lt;p&gt;Anyway, very appreciate you detailed analysis and explanation.&lt;/p&gt;</comment>
                            <comment id="40540" author="niu" created="Wed, 13 Jun 2012 22:18:24 +0000"  >&lt;p&gt;To change quota size, &quot;clear -&amp;gt; set new size&quot; isn&apos;t necessary safer than &apos;set new size&apos; directly.&lt;/p&gt;

&lt;p&gt;Whenever &apos;lfs setquota&apos; changes quota limit from zero to non-zero (or from non-zero to zero), quota master (MDS) will notify all slaves (OSTs) to change their local fs quota limit, however, if some slave is offline at that time (or the notification to some slave failed for some reason), inconsistence will raise between the master and the slaves which didn&apos;t receive quota change notification.&lt;/p&gt;

&lt;p&gt;Changing quota from old limit to new limit (all non-zero values) will not trigger the quota change notification to slaves, so it&apos;ll not cause the inconsistence, of course, it can&apos;t fix the inconsistence as well.&lt;/p&gt;

&lt;p&gt;In the new quota design, any later joined slave can sync the quota setting with master automatically, so user needn&apos;t worry about the offline slave anymore.&lt;/p&gt;</comment>
                            <comment id="41097" author="ihara" created="Mon, 25 Jun 2012 11:54:37 +0000"  >&lt;p&gt;Hello Niu,&lt;/p&gt;

&lt;p&gt;The problem is not fixed yet even after reset group quota. The last week, we did zero reset quota for all groups, but a couple of groups were failed to set quota above. &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;nos13: Jun 20 18:02:19 nos131i kernel: Lustre: 18627:0:(quota_interface.c:481:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)
nos13: Jun 20 21:40:47 nos131i kernel: Lustre: 29660:0:(quota_interface.c:481:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)
nos14: Jun 18 13:10:06 nos141i kernel: LustreError: 18554:0:(quota_ctl.c:260:filter_quota_ctl()) fail to create lqs during setquota operation for gid 10856
nos14: Jun 18 13:10:06 nos141i kernel: LustreError: 18575:0:(quota_ctl.c:260:filter_quota_ctl()) fail to create lqs during setquota operation for gid 10857
nos14: Jun 20 12:38:41 nos141i kernel: Lustre: 28944:0:(quota_interface.c:481:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Then, we set quota again to these groups, but today, we got same messages on OSS.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jun 25 03:03:59 nos141i kernel: Lustre: 28760:0:(quota_interface.c:481:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I don&apos;t know why still failing to acquire quota space from master. Any ideas of workaround to avoid this issue? or we can add debug to address this issue and to understand what happens?&lt;/p&gt;</comment>
                            <comment id="41116" author="niu" created="Mon, 25 Jun 2012 23:59:02 +0000"  >&lt;p&gt;Hi, Ihara&lt;/p&gt;

&lt;p&gt;If it&apos;s easy to be reproduced, we can collect the debug log on OSS with D_TRACE &amp;amp; D_QUOTA enabled (echo +trace &amp;gt; /proc/sys/lnet/debug, echo +quota &amp;gt; /proc/sys/lnet/debug), then we can see where the acquire quota returns zero. Thanks.&lt;/p&gt;</comment>
                            <comment id="41118" author="ihara" created="Tue, 26 Jun 2012 01:54:45 +0000"  >&lt;p&gt;Niu, I meant you could make debug patch to see more detail information during the production system. e.g) Who exceed quota limit by this messages. surely, we didn&apos;t do any operations (set/clear quota) when the following messages showed up. &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jun 25 03:03:59 nos141i kernel: Lustre: 28760:0:(quota_interface.c:481:quota_chk_acq_common()) still haven&apos;t managed to acquire quota space from the quota master after 10 retries (err=0, rc=0)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We don&apos;t know how reproduce this problem at this morment, but let me ask we can enable debug flags. BTW, what is &quot;28760:0&quot; of above messages?&lt;/p&gt;</comment>
                            <comment id="41122" author="niu" created="Tue, 26 Jun 2012 03:18:43 +0000"  >&lt;p&gt;Hi, Ihara&lt;/p&gt;

&lt;p&gt;The debug patch would be similar to enable D_TRACE &amp;amp; D_QUOTA for debug log, If the cusotmer can&apos;t affort D_TRACE debug log, we can only enable D_QUOTA first to collect some debug log.&lt;/p&gt;

&lt;p&gt;The 28760 is pid, and the 0 is &apos;extern pid&apos; (looks it&apos;s always 0 for now, you can just ignore it).&lt;/p&gt;</comment>
                            <comment id="41265" author="ihara" created="Thu, 28 Jun 2012 12:40:33 +0000"  >&lt;p&gt;Hi Niu,&lt;/p&gt;

&lt;p&gt;uploaded debug files on uploads/&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1438&quot; title=&quot;quota_chk_acq_common() still haven&amp;#39;t managed to acquire quota&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1438&quot;&gt;&lt;del&gt;LU-1438&lt;/del&gt;&lt;/a&gt;/debugfile.20120628.gz&lt;br/&gt;
However, it&apos;s not easy to reproduce this messages though the messages still showed up on the system log file irregularly.&lt;br/&gt;
The lustre debug file doesn&apos;t contain messages. perhaps, the maximum debug size (100MB) exceeded quickly?&lt;br/&gt;
Any ideas to keep track and debug information in this situation? &lt;/p&gt;</comment>
                            <comment id="41291" author="niu" created="Thu, 28 Jun 2012 23:56:40 +0000"  >&lt;p&gt;Hi, Ihara&lt;/p&gt;

&lt;p&gt;Could you apply this debug patch? then we&apos;ll see lots more deubug information in the syslong along with the &quot;still can&apos;t acquire...&quot; messages. Thanks.&lt;/p&gt;</comment>
                            <comment id="41294" author="ihara" created="Fri, 29 Jun 2012 00:34:20 +0000"  >&lt;p&gt;Hi Niu,&lt;/p&gt;

&lt;p&gt;Thanks for this debug patch. I will ask customer if we can apply patch.&lt;/p&gt;</comment>
                            <comment id="41447" author="ihara" created="Wed, 4 Jul 2012 02:49:53 +0000"  >&lt;p&gt;This might be related to 32bit quota setting limitation?&lt;/p&gt;</comment>
                            <comment id="41449" author="niu" created="Wed, 4 Jul 2012 03:04:57 +0000"  >&lt;p&gt;probably, is there anything useful from the debug log?&lt;/p&gt;</comment>
                            <comment id="41452" author="ihara" created="Wed, 4 Jul 2012 03:33:46 +0000"  >&lt;p&gt;We didn&apos;t apply patches yet, but will apply them on the next scheduled maintenance time.&lt;/p&gt;</comment>
                            <comment id="43465" author="ihara" created="Sat, 18 Aug 2012 09:14:04 +0000"  >&lt;p&gt;this might be caused by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1584&quot; title=&quot;error set quota fs limit&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1584&quot;&gt;&lt;del&gt;LU-1584&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1720&quot; title=&quot;Quota doesn&amp;#39;t work over 4TB on single OST&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1720&quot;&gt;&lt;del&gt;LU-1720&lt;/del&gt;&lt;/a&gt;. Please close this ticket. &lt;br/&gt;
I will open new ticket if we see same issue after apply &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1438&quot; title=&quot;quota_chk_acq_common() still haven&amp;#39;t managed to acquire quota&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1438&quot;&gt;&lt;del&gt;LU-1438&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1720&quot; title=&quot;Quota doesn&amp;#39;t work over 4TB on single OST&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1720&quot;&gt;&lt;del&gt;LU-1720&lt;/del&gt;&lt;/a&gt; patches.&lt;/p&gt;</comment>
                            <comment id="43470" author="pjones" created="Sat, 18 Aug 2012 10:34:38 +0000"  >&lt;p&gt;ok thanks Ihara&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="11447" name="20120524.tgz" size="1572654" author="ihara" created="Thu, 24 May 2012 05:54:56 +0000"/>
                            <attachment id="11665" name="LU-1438-debug.patch" size="5011" author="niu" created="Thu, 28 Jun 2012 23:56:40 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv6hb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4584</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>