<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:06:04 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-340] system hang when running sanity-quota on RHEL5-x86_64-OFED</title>
                <link>https://jira.whamcloud.com/browse/LU-340</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;system hang when running sanity-quota on RHEL5-x86_64-ofa build. Please see the attachment for all the logs.&lt;/p&gt;</description>
                <environment>lustre-master/RHEL5-x86_64/#120/ofa build</environment>
        <key id="10891">LU-340</key>
            <summary>system hang when running sanity-quota on RHEL5-x86_64-OFED</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="sarah">Sarah Liu</reporter>
                        <labels>
                    </labels>
                <created>Tue, 17 May 2011 22:19:56 +0000</created>
                <updated>Mon, 1 Apr 2013 03:07:12 +0000</updated>
                            <resolved>Mon, 1 Apr 2013 03:07:12 +0000</resolved>
                                    <version>Lustre 2.1.0</version>
                    <version>Lustre 2.1.1</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="14486" author="pjones" created="Wed, 18 May 2011 05:33:52 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Please look into this quotas issue when you get a chance&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="14645" author="niu" created="Thu, 19 May 2011 00:38:50 +0000"  >&lt;p&gt;From the log we can see all pdflush threads on client were waiting on page lock, whereas the dd thread was holding the page lock to do synchronous IO, because of something wrong with group quota, the synchronous I/O can&apos;t finish in time, which caused the pdflush threads stalled.&lt;/p&gt;

&lt;p&gt;What confused me is that there were lots of &quot;dqacq/dqrel failed! (rc:-5)&quot; errors while setting group quota, but setting user quota was done successfully, and the user quota limit tests passed also. Looks there are only two possible cases that dqacq_handler() return -EIO, one is OBD_FAIL_OBD_DQACQ and another is ll_sb_has_quota_active() checking fails.&lt;/p&gt;

&lt;p&gt;Hi, Sarah&lt;/p&gt;

&lt;p&gt;Is it repeatable? What&apos;s the /proc/fs/lustre/fail_loc on mds? Thanks.&lt;/p&gt;</comment>
                            <comment id="14682" author="sarah" created="Thu, 19 May 2011 13:27:26 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Is it repeatable? What&apos;s the /proc/fs/lustre/fail_loc on mds? Thanks.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;yes, it can be reproduced. &lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@fat-intel-1 ~&amp;#93;&lt;/span&gt;# more /proc/sys/lustre/fail_loc &lt;br/&gt;
0&lt;/p&gt;</comment>
                            <comment id="14710" author="niu" created="Thu, 19 May 2011 23:59:18 +0000"  >&lt;p&gt;Is the D_QUOTA enabled? can we get the debug log on MDS?&lt;/p&gt;</comment>
                            <comment id="14711" author="sarah" created="Fri, 20 May 2011 00:20:07 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Is the D_QUOTA enabled?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;no. I can give you debug log tomorrow. please tell me the debug mask&lt;/p&gt;</comment>
                            <comment id="14712" author="niu" created="Fri, 20 May 2011 01:01:17 +0000"  >&lt;p&gt;I think the default + D_QUOTA will be fine, thank you, Sarah.&lt;/p&gt;</comment>
                            <comment id="14888" author="niu" created="Sun, 22 May 2011 20:05:32 +0000"  >&lt;p&gt;Thank you, Sarah. I think the debug_log confirmed that dqacq_handler failed for group quota not enabled or fail_loc set.&lt;/p&gt;

&lt;p&gt;Could you try the following commands on client-5 to see what will happen? (quotacheck then set group quota):&lt;br/&gt;
lfs quotacheck -ug lustre_dir&lt;br/&gt;
lfs setquota -g group_name -b 0 -B 0 -i 0 -I 0 lustre_dir&lt;/p&gt;</comment>
                            <comment id="14952" author="sarah" created="Tue, 24 May 2011 13:15:59 +0000"  >&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;root@client-15 ~&amp;#93;&lt;/span&gt;# lfs quotacheck -ug /mnt/lustre/&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@client-15 ~&amp;#93;&lt;/span&gt;# lfs setquota -g quota_usr -b 0 -B 0 -i 0 -I 0 /mnt/lustre/&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;root@client-15 ~&amp;#93;&lt;/span&gt;# mount&lt;br/&gt;
/dev/sda1 on / type ext3 (rw)&lt;br/&gt;
proc on /proc type proc (rw)&lt;br/&gt;
sysfs on /sys type sysfs (rw)&lt;br/&gt;
devpts on /dev/pts type devpts (rw,gid=5,mode=620)&lt;br/&gt;
tmpfs on /dev/shm type tmpfs (rw)&lt;br/&gt;
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)&lt;br/&gt;
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)&lt;br/&gt;
192.168.4.128@o2ib:/lustre on /mnt/lustre type lustre (rw,flock)&lt;/p&gt;</comment>
                            <comment id="15161" author="niu" created="Thu, 26 May 2011 23:56:36 +0000"  >&lt;p&gt;When I logon to the system, I found that &quot;lfs quotaon -ug&quot; can&apos;t turn on the local fs group quota on mds, though it can be successfully executed and no any abnormal messages in the debug log. &lt;/p&gt;

&lt;p&gt;The local fs group quota can be enabled by a &quot;lfs quotaon -g&quot;, and after the &quot;lfs quotaon -g&quot; executed, the system returned back to normal status, the group quota can be enable/disabled by &quot;lfs quotaon/off -ug&quot; again.&lt;/p&gt;

&lt;p&gt;This bug appeared only on ofa build server, so I suspect it&apos;s ofa build related, will continue the investigation when I have time and spare nodes.&lt;/p&gt;</comment>
                            <comment id="19675" author="yujian" created="Mon, 29 Aug 2011 02:54:50 +0000"  >&lt;p&gt;Lustre Clients:&lt;br/&gt;
Tag: 1.8.6-wc1&lt;br/&gt;
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18_238.12.1.el5.x86_64)&lt;br/&gt;
Build: &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el5,ib_stack=ofa/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://newbuild.whamcloud.com/job/lustre-b1_8/100/arch=x86_64,build_type=client,distro=el5,ib_stack=ofa/&lt;/a&gt;&lt;br/&gt;
Network: IB (OFED 1.5.3.1)&lt;/p&gt;

&lt;p&gt;Lustre Servers:&lt;br/&gt;
Tag: v2_1_0_0_RC1&lt;br/&gt;
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-238.19.1.el5_lustre.g65156ed.x86_64)&lt;br/&gt;
Build: &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://newbuild.whamcloud.com/job/lustre-master/273/arch=x86_64,build_type=server,distro=el5,ib_stack=ofa/&lt;/a&gt;&lt;br/&gt;
Network: IB (OFED 1.5.3.1)&lt;/p&gt;

&lt;p&gt;sanity-quota test 1 hung: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/842c0928-cfc6-11e0-8d02-52540025f9af&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/842c0928-cfc6-11e0-8d02-52540025f9af&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Dmesg on MDS (fat-amd-1-ib) showed:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: DEBUG MARKER: == test 1: Block hard limit (normal use and out of quota) === == 01:51:35
Lustre: DEBUG MARKER: User quota (limit: 95511 kbytes)
Lustre: DEBUG MARKER: Write ...
Lustre: DEBUG MARKER: Done
Lustre: DEBUG MARKER: Write out of block quota ...
Lustre: DEBUG MARKER: --------------------------------------
Lustre: DEBUG MARKER: Group quota (limit: 95511 kbytes)
LustreError: 8250:0:(ldlm_lib.c:2341:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-5)
LustreError: 8251:0:(ldlm_lib.c:2341:target_handle_dqacq_callback()) dqacq/dqrel failed! (rc:-5)
LustreError: 6520:0:(quota_context.c:708:dqacq_completion()) acquire qunit got error! (rc:-5)
LustreError: 6520:0:(quota_master.c:1263:mds_init_slave_blimits()) error mds adjust local block quota! (rc:-5)
LustreError: 6520:0:(quota_master.c:1442:mds_set_dqblk()) init slave blimits failed! (rc:-5)
&amp;lt;~snip~&amp;gt;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="19763" author="yujian" created="Tue, 30 Aug 2011 04:10:48 +0000"  >&lt;p&gt;Lustre Branch: master&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://newbuild.whamcloud.com/job/lustre-master/273/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://newbuild.whamcloud.com/job/lustre-master/273/&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL5/x86_64&lt;br/&gt;
Network: IB (OFED 1.5.3.1)&lt;/p&gt;

&lt;p&gt;The same failure occurred while running sanity-quota test: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/4115f084-d2de-11e0-8d02-52540025f9af&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/4115f084-d2de-11e0-8d02-52540025f9af&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="28864" author="yujian" created="Thu, 16 Feb 2012 07:02:19 +0000"  >&lt;p&gt;Lustre Tag: v2_1_1_0_RC2&lt;br/&gt;
Lustre Build: &lt;a href=&quot;http://build.whamcloud.com/job/lustre-b2_1/41/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://build.whamcloud.com/job/lustre-b2_1/41/&lt;/a&gt;&lt;br/&gt;
Distro/Arch: RHEL5/x86_64 (kernel version: 2.6.18-274.12.1.el5)&lt;br/&gt;
Network: IB (OFED 1.5.4)&lt;/p&gt;

&lt;p&gt;The same issue occurred: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/f95cf180-584c-11e1-9df1-5254004bbbd3&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/f95cf180-584c-11e1-9df1-5254004bbbd3&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="55156" author="niu" created="Mon, 1 Apr 2013 03:07:12 +0000"  >&lt;p&gt;Fixed in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1782&quot; title=&quot;Ignore sb_has_quota_active() in OFED&amp;#39;s header&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1782&quot;&gt;&lt;del&gt;LU-1782&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="15577">LU-1782</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="10216" name="client-18-syslog-trace.log" size="2444538" author="sarah" created="Tue, 17 May 2011 22:19:56 +0000"/>
                            <attachment id="10215" name="client-5-syslog-trace.log" size="2754812" author="sarah" created="Tue, 17 May 2011 22:19:56 +0000"/>
                            <attachment id="10218" name="mds-debug.log" size="2249872" author="sarah" created="Fri, 20 May 2011 23:46:48 +0000"/>
                            <attachment id="10217" name="mds-ost.tar.gz" size="762522" author="sarah" created="Tue, 17 May 2011 22:19:56 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvf5j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6100</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>