<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:12:23 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7842] ACL&apos;s applied over NFS are not consistent when looping file operations </title>
                <link>https://jira.whamcloud.com/browse/LU-7842</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;We are experiencing an issue where on a lustre client we do not see the issue, but exported over NFS we see this issue on one of our three production lustre file systems.  We cannot reproduce this issue on any other system, but is causing us production issues on our oldest lustre instance.  &lt;/p&gt;

&lt;p&gt;Running the script attached over NFS after a few iterations we hit the following issue:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;joe59240@vws250 joe59240&amp;#93;&lt;/span&gt;$ /dls/tmp/joe59240/stresstest&lt;br/&gt;
....mkdir: cannot create directory `5&apos;: Permission denied&lt;/p&gt;

&lt;p&gt;Each &quot;.&quot; is an iteration of the loop as you will see in the script&lt;/p&gt;

&lt;p&gt;This persists for maybe as long as five seconds before files can be written to the folder again and the script runs.  &lt;/p&gt;

&lt;p&gt;At the moment we have not ever had the script run to completion, but on other lustre file systems we can run it hundreds of times to completion.&lt;br/&gt;
The file system has many different NFS exporters each exporting a different folder in the root of the file system.  As is common practice on all other systems at Diamond.  We can re produce this on all exporters attached to this particular file system.&lt;/p&gt;

&lt;p&gt;We are thinking after a number of weeks looking at the issue that it is not the exporter as this is across all servers that export lustre but down to the interaction between FS and NFS. &lt;/p&gt;

&lt;p&gt;We have put in a few sleeps in the script to try and identify if there is a buffering issue where we are modifying or deleting before a flush to disk.  But this has not improved the symptoms.  &lt;/p&gt;

&lt;p&gt;Would it be possible to advise further debugging?&lt;/p&gt;
</description>
                <environment></environment>
        <key id="35135">LU-7842</key>
            <summary>ACL&apos;s applied over NFS are not consistent when looping file operations </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="laisiyao">Lai Siyao</assignee>
                                    <reporter username="davebond-diamond">Dave Bond</reporter>
                        <labels>
                    </labels>
                <created>Thu, 3 Mar 2016 11:13:36 +0000</created>
                <updated>Thu, 14 Jun 2018 21:41:20 +0000</updated>
                            <resolved>Mon, 24 Jul 2017 14:31:39 +0000</resolved>
                                    <version>Lustre 2.7.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="144584" author="pjones" created="Fri, 4 Mar 2016 02:06:02 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Could you please advise?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="144843" author="laisiyao" created="Tue, 8 Mar 2016 03:42:42 +0000"  >&lt;p&gt;I couldn&apos;t reproduce locally, but I suspect this is a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6528&quot; title=&quot;permission denied over NFS (intermittent&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6528&quot;&gt;&lt;del&gt;LU-6528&lt;/del&gt;&lt;/a&gt;, and &lt;a href=&quot;http://review.whamcloud.com/14978&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14978&lt;/a&gt; and &lt;a href=&quot;http://review.whamcloud.com/17815&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/17815&lt;/a&gt; are the fix of it. Could you test with latest master build (on lustre client only) to verify whether it can fix? If so, I can backport these two patches to 2.7.&lt;/p&gt;</comment>
                            <comment id="144878" author="davebond-diamond" created="Tue, 8 Mar 2016 15:46:24 +0000"  >&lt;p&gt;After testing this. I can no longer reproduce this on a client that I once could.  &lt;br/&gt;
Please can this be pushed into 2.7?&lt;/p&gt;</comment>
                            <comment id="144940" author="ferner" created="Tue, 8 Mar 2016 21:23:51 +0000"  >&lt;p&gt;I was the one originally reporting &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6528&quot; title=&quot;permission denied over NFS (intermittent&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6528&quot;&gt;&lt;del&gt;LU-6528&lt;/del&gt;&lt;/a&gt;. We have (a version of) &lt;a href=&quot;http://review.whamcloud.com/14978&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/14978&lt;/a&gt; already included since they were referenced in that bug. However it seems we have completely missed the updates and &lt;a href=&quot;http://review.whamcloud.com/17815&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/17815&lt;/a&gt; most likely as there wasn&apos;t any reference to this on any ticket we were monitoring. &lt;/p&gt;

&lt;p&gt;So yes, we would very much appreciate versions of all relevant patches that we can apply to our 2.7 based clients. As we have a maintenance period approaching at the end of this week, I would also very much appreciate if we could have these patches before Friday.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Frederik&lt;/p&gt;</comment>
                            <comment id="144989" author="pjones" created="Wed, 9 Mar 2016 11:39:04 +0000"  >&lt;p&gt;Dave/Frederik&lt;/p&gt;

&lt;p&gt;The relevant patches have been ported and are going through testing and reviews atm&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="145822" author="davebond-diamond" created="Wed, 16 Mar 2016 17:25:13 +0000"  >&lt;p&gt;Hi All,&lt;/p&gt;

&lt;p&gt;Could we possibly have an update as to the progress of the patch testing.  We would like to get the latest 2.7 including this to test in our production environment ASAP.  &lt;/p&gt;</comment>
                            <comment id="145974" author="yujian" created="Thu, 17 Mar 2016 16:04:00 +0000"  >&lt;p&gt;Hi Dave,&lt;br/&gt;
The back-ported patch for &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7630&quot; title=&quot;permission denied over NFS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7630&quot;&gt;&lt;del&gt;LU-7630&lt;/del&gt;&lt;/a&gt; in &lt;a href=&quot;http://review.whamcloud.com/18828&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/18828&lt;/a&gt; is now ready to land.&lt;/p&gt;</comment>
                            <comment id="154458" author="davebond-diamond" created="Thu, 2 Jun 2016 16:46:42 +0000"  >&lt;p&gt;Hello, &lt;/p&gt;

&lt;p&gt;It would appear that the issue got a lot better but never went away.  In the latest client version running &lt;/p&gt;

&lt;p&gt;lustre: 2.7.2&lt;br/&gt;
kernel: patchless_client&lt;br/&gt;
build:  v2_7_1_DLS_20160330-gf4709ff-CHANGED-2.6.32-573.22.1.el6.x86_64&lt;/p&gt;

&lt;p&gt;From an NFS client mounting this area&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;joe59240@vws250 mx-scratch&amp;#93;&lt;/span&gt;$ ~/dls-science-user-area/benchmarking/stresstest&lt;br/&gt;
......touch: cannot touch `5/somefile&apos;: Permission denied&lt;br/&gt;
&lt;span class=&quot;error&quot;&gt;&amp;#91;joe59240@vws250 mx-scratch&amp;#93;&lt;/span&gt;$ &lt;/p&gt;


&lt;p&gt;Would you have expected this to include the fix?  &lt;/p&gt;</comment>
                            <comment id="154707" author="davebond-diamond" created="Mon, 6 Jun 2016 09:08:03 +0000"  >&lt;p&gt;We are approaching the end of our maintenance period.  Would it be possible to get an update on this?&lt;/p&gt;</comment>
                            <comment id="154725" author="laisiyao" created="Mon, 6 Jun 2016 14:26:03 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6528&quot; title=&quot;permission denied over NFS (intermittent&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6528&quot;&gt;&lt;del&gt;LU-6528&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7630&quot; title=&quot;permission denied over NFS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7630&quot;&gt;&lt;del&gt;LU-7630&lt;/del&gt;&lt;/a&gt; are the known permission deny issues, your test failure looks to be a new one. We need more informations to triage, could you collect lustre debuglog on NFS server and MDS server? And can you also collect NFS client and server logs (enabled by `echo 2047 &amp;gt; /proc/sys/sunrpc/nfs_debug` on NFS client, and `echo 2047 &amp;gt; /proc/sys/sunrpc/nfsd_debug` on NFS server)?&lt;/p&gt;</comment>
                            <comment id="154884" author="davebond-diamond" created="Tue, 7 Jun 2016 10:34:38 +0000"  >&lt;p&gt;I have just attached the logs you requested.  Let me know if there are any more details I can give you.  &lt;/p&gt;</comment>
                            <comment id="155065" author="laisiyao" created="Wed, 8 Jun 2016 13:56:46 +0000"  >&lt;p&gt;I do see -13 error code in nfs logs, but I&apos;m afraid the lustre debug log is not dumped in time, and the related logs were discarded (lustre only keeps a certain amount of debug logs in memory).&lt;/p&gt;

&lt;p&gt;Could you modify your test script a bit to check errors of each command, and dump logs just upon error?&lt;/p&gt;</comment>
                            <comment id="155660" author="davebond-diamond" created="Tue, 14 Jun 2016 14:17:21 +0000"  >&lt;p&gt;Just uploaded new dump file.  &lt;/p&gt;

&lt;p&gt;This was collected by&lt;/p&gt;

&lt;p&gt;sudo lctl debug_daemon start /tmp/lustre-dump-14-05-16&lt;br/&gt;
and &lt;br/&gt;
stopped after the error had shown up on the NFS client.  &lt;/p&gt;</comment>
                            <comment id="156312" author="davebond-diamond" created="Tue, 21 Jun 2016 09:31:46 +0000"  >&lt;p&gt;From the time stamp it has been 6 days since the last update when I uploaded the latest logs.  Any chance of an update even to say you are still looking?  We have shared this ticket number with the developers whom this is causing pain and I would like to provide them with an update.&lt;/p&gt;</comment>
                            <comment id="156459" author="laisiyao" created="Wed, 22 Jun 2016 07:13:12 +0000"  >&lt;p&gt;I don&apos;t find any clue in debug logs, so the -13 might be generated from NFS code (though it may be caused by lustre code, but it may be wrong attribute fetched to NFS server from MDS).&lt;/p&gt;

&lt;p&gt;I&apos;ll see whether I can make a patch to add some debug messages.&lt;/p&gt;</comment>
                            <comment id="156485" author="gerrit" created="Wed, 22 Jun 2016 14:06:05 +0000"  >&lt;p&gt;Lai Siyao (lai.siyao@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/20920&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/20920&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7842&quot; title=&quot;ACL&amp;#39;s applied over NFS are not consistent when looping file operations &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7842&quot;&gt;&lt;del&gt;LU-7842&lt;/del&gt;&lt;/a&gt; nfs: don&apos;t drop cap for getattr too&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 8f76d6583ebb377200f14c4d547dca73d6581d1b&lt;/p&gt;</comment>
                            <comment id="156486" author="laisiyao" created="Wed, 22 Jun 2016 14:09:07 +0000"  >&lt;p&gt;hi Dave, I just pushed a patch, which changes MDS code only, could you apply it and test again?&lt;/p&gt;</comment>
                            <comment id="156736" author="ferner" created="Thu, 23 Jun 2016 18:26:41 +0000"  >&lt;p&gt;Lai,&lt;/p&gt;

&lt;p&gt;I&apos;ve looked at the patch, it has received a &apos;-1&apos; from maloo but I can&apos;t work out if this is a failure that is seen elsewhere. I think it might be but would like to double check before considering to apply this patch for a test. Considering we are unfortunately only seeing this on a production file system, and MDS changes require a full file system outage, we will need to schedule this and I currently can&apos;t promise when this will happen. Hopefully early next week if everything else looks good.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Frederik&lt;/p&gt;</comment>
                            <comment id="156830" author="laisiyao" created="Fri, 24 Jun 2016 02:00:42 +0000"  >&lt;p&gt;The autotest failure looks to be caused by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8305&quot; title=&quot;sanity-sec test_27: @@@@@@ FAIL: fileset not cleared on nodemap c0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8305&quot;&gt;&lt;del&gt;LU-8305&lt;/del&gt;&lt;/a&gt;. I&apos;ll watch the progress of that ticket.&lt;/p&gt;</comment>
                            <comment id="156848" author="pjones" created="Fri, 24 Jun 2016 13:36:05 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;While &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-8305&quot; title=&quot;sanity-sec test_27: @@@@@@ FAIL: fileset not cleared on nodemap c0&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-8305&quot;&gt;&lt;del&gt;LU-8305&lt;/del&gt;&lt;/a&gt; may prevent this change completing testing on master, it should have no relevance to Diamond running on the 2.7 FE branch so could you please port the patch there for them to try?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="156963" author="laisiyao" created="Mon, 27 Jun 2016 02:45:03 +0000"  >&lt;p&gt;okay, I&apos;ll do it now.&lt;/p&gt;</comment>
                            <comment id="157081" author="laisiyao" created="Tue, 28 Jun 2016 01:58:29 +0000"  >&lt;p&gt;the patch for b2_7_fe is on: &lt;a href=&quot;http://review.whamcloud.com/#/c/20992/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/20992/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="157601" author="pjones" created="Mon, 4 Jul 2016 12:40:00 +0000"  >&lt;p&gt;Dave/Frederik&lt;/p&gt;

&lt;p&gt;Have you applied the supplied diagnostic patch?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="157650" author="ferner" created="Tue, 5 Jul 2016 10:40:19 +0000"  >&lt;p&gt;Peter,&lt;/p&gt;

&lt;p&gt;the patch last week was a bit late for that maintenance window, so we had to wait until this week.&lt;/p&gt;

&lt;p&gt;We have applied the patch on the MDS this morning and so far we&apos;ve not been able to reproduce the issue, though if I remember right, this sometimes had been the case immediately after rebooting the NFS server. And we did have to reboot the NFS server as it suffered a LBUG after finishing recovery. We&apos;re looking into this and if we can&apos;t find anything in Jira, we&apos;ll open another ticket for that.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Frederik&lt;/p&gt;</comment>
                            <comment id="188854" author="pjones" created="Sat, 18 Mar 2017 13:11:53 +0000"  >&lt;p&gt;Frederik&lt;/p&gt;

&lt;p&gt;Any news?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="203340" author="pjones" created="Mon, 24 Jul 2017 14:31:39 +0000"  >&lt;p&gt;ok so either this is no longer happening or you are no longer concerned about it. Either way, I&apos;ll close out the ticket &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="33932">LU-7630</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="21861" name="lustre-dump.tar.gz" size="9782900" author="davebond-diamond" created="Tue, 14 Jun 2016 14:16:26 +0000"/>
                            <attachment id="21795" name="lustre-logs.tar.gz" size="14239469" author="davebond-diamond" created="Tue, 7 Jun 2016 10:34:07 +0000"/>
                            <attachment id="20646" name="stresstest" size="227" author="davebond-diamond" created="Thu, 3 Mar 2016 11:13:36 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 7 Jun 2016 11:13:36 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzy3j3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 3 Mar 2016 11:13:36 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>