<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:00:57 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-13400] sanity test_300d: createmany 10 under striped dir fails with Permission denied</title>
                <link>https://jira.whamcloud.com/browse/LU-13400</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for Olaf Faaland &amp;lt;faaland1@llnl.gov&amp;gt;&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;https://testing.whamcloud.com/test_sets/150c0a5d-19ca-4d0e-bbd4-5d98ae175f9a&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/150c0a5d-19ca-4d0e-bbd4-5d98ae175f9a&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;test_300d failed with the following error:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;create 10 files failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; == sanity test 300d: check default stripe under striped directory ==================================== 12:17:4    3 (1585595863)
  2 open(/mnt/diane/d300d.sanity/striped_dir/f1) error: Permission denied
  3 total: 1 open/close in 0.18 seconds: 5.62 ops/second
  4  sanity test_300d: @@@@@@ FAIL: create 10 files failed
  5   Trace dump:
  6   = /home/olaf/lustre/lustre/tests/test-framework.sh:6138:error()
  7   = /home/olaf/lustre/lustre/tests/sanity.sh:19623:test_300d()
  8   = /home/olaf/lustre/lustre/tests/test-framework.sh:6441:run_one()
  9   = /home/olaf/lustre/lustre/tests/test-framework.sh:6490:run_one_logged()
 10   = /home/olaf/lustre/lustre/tests/test-framework.sh:6315:run_test()
 11   = /home/olaf/lustre/lustre/tests/sanity.sh:19639:main()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I can locally reproduce reliably with:&lt;br/&gt;
 FSTYPE=zfs (zfs-0.7.11 or zfs-0.8.3)&lt;br/&gt;
 MDSCOUNT=2&lt;br/&gt;
 lustre 2.12.4, 2.13.0, and master&lt;br/&gt;
 Run 20 times: sudo lustre/tests/auster sanity --only 300&lt;/p&gt;

&lt;p&gt;I&apos;ve seen the issue on average about 15% of runs with zfs-0.8 and less often with zfs-0.7.11&lt;/p&gt;

&lt;p&gt;I&apos;ve reproduced the issue with lustre 2.12.4, 2.13.0, master, and master-next.&lt;br/&gt;
 Earlier lustre versions reliably LBUG before getting this far in test_300, in this configuration, so I don&apos;t know if the bug existed in earlier versions.&lt;/p&gt;

&lt;p&gt;There are no warnings or errors in dmesg, emitted while the test is running (on any node).&lt;/p&gt;

&lt;p&gt;VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV&lt;br/&gt;
 sanity test_300d - create 10 files failed&lt;/p&gt;</description>
                <environment></environment>
        <key id="58555">LU-13400</key>
            <summary>sanity test_300d: createmany 10 under striped dir fails with Permission denied</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                            <label>llnl</label>
                    </labels>
                <created>Mon, 30 Mar 2020 19:38:56 +0000</created>
                <updated>Mon, 28 Sep 2020 15:13:32 +0000</updated>
                            <resolved>Sun, 27 Sep 2020 04:04:34 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="266366" author="ofaaland" created="Mon, 30 Mar 2020 20:56:04 +0000"  >&lt;p&gt;I&apos;ve replaced the createmany with a loop that does mkdirs, and find that the mkdirs fail for only one of the two MDTs:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;mkdir: cannot create directory &apos;/mnt/diane/d300d.sanity/striped_dir/f.1&apos;: Permission denied
mkdir /mnt/diane/d300d.sanity/striped_dir/f.1 failed
mkdir: cannot create directory &apos;/mnt/diane/d300d.sanity/striped_dir/f.3&apos;: Permission denied
mkdir /mnt/diane/d300d.sanity/striped_dir/f.3 failed
mkdir: cannot create directory &apos;/mnt/diane/d300d.sanity/striped_dir/f.5&apos;: Permission denied
mkdir /mnt/diane/d300d.sanity/striped_dir/f.5 failed
mkdir: cannot create directory &apos;/mnt/diane/d300d.sanity/striped_dir/f.7&apos;: Permission denied
mkdir /mnt/diane/d300d.sanity/striped_dir/f.7 failed
mkdir: cannot create directory &apos;/mnt/diane/d300d.sanity/striped_dir/f.9&apos;: Permission denied
mkdir /mnt/diane/d300d.sanity/striped_dir/f.9 failed
mkdir: cannot create directory &apos;/mnt/diane/d300d.sanity/striped_dir/f.10&apos;: Permission denied
mkdir /mnt/diane/d300d.sanity/striped_dir/f.10 failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I took a look and the subdirs that were created successfully were on MDT0, so the ones that failed were on MDT1.&lt;/p&gt;</comment>
                            <comment id="266372" author="ofaaland" created="Tue, 31 Mar 2020 01:25:01 +0000"  >&lt;p&gt;I&apos;m attempting to compare debug logs for a subdirectory created successfully (f.2) and one which failed (f.1).&lt;/p&gt;

&lt;p&gt;In both cases, I see lmv_create() correctly identifies the MDT, a metadata modify RPC slot is assigned, an XID is assigned, and the RPC is sent successfully. I believe all that is on the client, but if I&apos;m wrong please let me know.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; lmv_create()) CREATE name &apos;&amp;lt;dirname&amp;gt;&apos; [&amp;lt;FID&amp;gt;] on [&amp;lt;SHARDFID&amp;gt;] -&amp;gt; mds #&amp;lt;MDT_INDEX&amp;gt;
obd_get_mod_rpc_slot()) &amp;lt;MDT_EXPORT&amp;gt;: modify RPC slot &amp;lt;SLOT_INDEX&amp;gt; is alloca
ptlrpc_reassign_next_xid()) @@@ reassign xid  req@&amp;lt;reqid&amp;gt; x&amp;lt;xid&amp;gt;
ptlrpc_send_new_req()) Sending RPC req@&amp;lt;reqid&amp;gt; and pname:cluuid:pid:xid:nid:
ptlrpc_send_new_req()) Process leaving (rc=0 : 0 : 0)&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For the successful mkdir, I then see the request is enqueued (which as I understand it occurs in the MDT):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;ldlm_cli_enqueue_local()) ### client-side local enqueue handler, new lock created ns: mdt-diane-MDT0000_UUID lock: ffff972604887180/0x56189f42e7cba373 lrc: 3/0,1 mode: CW/CW      res: &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;But for the failed mkdir I do not see that.&lt;/p&gt;

&lt;p&gt;And for the failed mkdir I see ptlrpc_check_status() indicates -EACCES as one might expect&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;(client.c:1343:ptlrpc_check_status()) @@@ check status: rc = -13  req@ffff9728e9f40480 x1662629166884800/t0(0) o36-&amp;gt;diane-MDT0001-mdc-ffff9725fdff3800@10.0.2.15@tcp:12/10 lens 568/448 e 0 to 0 d    l 1585607163 ref 2 fl Rpc:RQU/0/0 rc 0/-13 job:&apos;mkdir.0&apos;&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&#160;&lt;br/&gt;
But I&apos;m having trouble figuring out what key functions to look at on the server side, on MDT1, that cause the request to fail.&lt;br/&gt;
&#160;&lt;/p&gt;</comment>
                            <comment id="266373" author="ofaaland" created="Tue, 31 Mar 2020 01:26:48 +0000"  >&lt;p&gt;I added the topllnl label.  This is not something we are seeing in production (we are currently just testing DNE2) but I would like to track it down or at least help narrow the scope.  This test seems to fail fairly frequently.&lt;/p&gt;</comment>
                            <comment id="266459" author="ofaaland" created="Tue, 31 Mar 2020 17:07:31 +0000"  >&lt;p&gt;Also, in the debug logs for the successful mkdir I see mdt_handler.c, mdt_reint.c, and mdd_permission.c (e.g. mdd_create() and mdt_object_new()).&lt;/p&gt;

&lt;p&gt;&lt;del&gt;But in the failed mkdir I see none of those.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;I made some minor tweaks to the test (e.g. added lctl mark after each mkdir) and in the new logs I do see mdt_object_new(), mdd_create(), and __mdd_permission_internal() calls, and that last returns -13 as one would expect.&lt;/p&gt;

&lt;p&gt;I&apos;m not sure whether I overlooked those in the original log, or the failure mode was different.&lt;/p&gt;</comment>
                            <comment id="266462" author="pjones" created="Tue, 31 Mar 2020 17:46:21 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please advise here?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="266493" author="ofaaland" created="Wed, 1 Apr 2020 02:10:03 +0000"  >&lt;p&gt;The -13 is coming from here:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;         /*
         * Nobody gets write access to an immutable file.
         */
        if (mask &amp;amp; MAY_WRITE &amp;amp;&amp;amp; la-&amp;gt;la_flags &amp;amp; LUSTRE_IMMUTABLE_FL)
                RETURN(-EACCES);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;in __mdd_permission_internal()&lt;/p&gt;</comment>
                            <comment id="266494" author="laisiyao" created="Wed, 1 Apr 2020 02:31:00 +0000"  >&lt;p&gt;Thanks Olaf, this is quite helpful, I&apos;m reviewing the code now.&lt;/p&gt;</comment>
                            <comment id="266495" author="laisiyao" created="Wed, 1 Apr 2020 03:42:26 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10753&quot; title=&quot;sanity test 300c fails with &amp;#39;create 5k files failed&amp;#39; &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10753&quot;&gt;&lt;del&gt;LU-10753&lt;/del&gt;&lt;/a&gt; reported the similar failure, and it looks like this failed on zfs system only. I don&apos;t see LFSCK related messages in the logs, I suspect it&apos;s because ZFS_IMMUTABLE is set the parent directory during the test, do you have any clue?&lt;/p&gt;</comment>
                            <comment id="266500" author="ofaaland" created="Wed, 1 Apr 2020 05:00:59 +0000"  >&lt;blockquote&gt;&lt;p&gt;I suspect it&apos;s because ZFS_IMMUTABLE is set the parent directory during the test, do you have any clue?&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;No, not really.&lt;/p&gt;

&lt;p&gt;I have started looking into whether one of the other tests was not cleaning up properly, but I think bad cleanup does not explain it. In my testing, the subdirs on one MDT succeed, and the subdirs on the other one fail. So apparently one shard has the bit set, and the other does not.&lt;/p&gt;</comment>
                            <comment id="266501" author="ofaaland" created="Wed, 1 Apr 2020 05:12:22 +0000"  >&lt;p&gt;I did notice one other thing.&#160; I altered sanity.sh to replace &quot;run_test 300d ...&quot; with this:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;for xx in $(seq 1 30); do
  run_test 300d &quot;check default stripe under striped directory&quot;
done&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I found that every time I ran auster --only 300, either 300d failed every time (all xx in 1,2.3,...,30), or not at all.&lt;/p&gt;

&lt;p&gt;I believe that means that the problem is occurring in an earlier subtest.  I&apos;ll see what more I can learn.&lt;/p&gt;</comment>
                            <comment id="268145" author="arshad512" created="Tue, 21 Apr 2020 15:28:55 +0000"  >&lt;p&gt;Seen on Master&#160; Run : &lt;a href=&quot;https://testing.whamcloud.com/test_sets/50b3635a-8810-4f18-b3fe-da0bf0eac4fe&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/50b3635a-8810-4f18-b3fe-da0bf0eac4fe&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="270397" author="adilger" created="Fri, 15 May 2020 23:24:31 +0000"  >&lt;p&gt;+1 on master: &lt;a href=&quot;https://testing.whamcloud.com/test_sets/75fa672b-9224-4efc-8223-fa6663f6b220&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/75fa672b-9224-4efc-8223-fa6663f6b220&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Side note for Olaf - if submitting patches for debugging/fixing intermittent issues, it is possible to use:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Test-Parameters: testlist=sanity env=ONLY=300d,ONLY_REPEAT=N
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;to have the specific test run N times in a loop. The test is considered a failure if any of the iterations fails.  You can specify a range of subtests like &quot;&lt;tt&gt;ONLY=290-300&lt;/tt&gt;&quot; but &quot;&lt;tt&gt;ONLY_REPEAT=N&lt;/tt&gt;&quot; will run each subtest N times in a row before moving to the next subtest.&lt;/p&gt;

&lt;p&gt;It looks like auster also has a command-line option &quot;&lt;tt&gt;-i N&lt;/tt&gt;&quot; to rerun a subtest N times without having to edit .&lt;/p&gt;</comment>
                            <comment id="280786" author="laisiyao" created="Sun, 27 Sep 2020 04:04:34 +0000"  >&lt;p&gt;This is a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-10753&quot; title=&quot;sanity test 300c fails with &amp;#39;create 5k files failed&amp;#39; &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-10753&quot;&gt;&lt;del&gt;LU-10753&lt;/del&gt;&lt;/a&gt;, and it&apos;s fixed there.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="51080">LU-10753</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00wmf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>