<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:40:43 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-11074] Invalid argument reading file caps</title>
                <link>https://jira.whamcloud.com/browse/LU-11074</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;2.10.4 client seems to have introduced a regression from 2.10.3.&lt;/p&gt;

&lt;p&gt;we now see this message from clients&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Jun  7 06:33:32 john73 kernel: Invalid argument reading file caps for /home/fstars/dwf_prepipe/dwf_prepipe_processccd.py
Jun  7 10:55:40 bryan8 kernel: Invalid argument reading file caps for /bin/date
Jun  7 11:05:29 john75 kernel: Invalid argument reading file caps for /usr/bin/basename
Jun  7 11:51:29 john97 kernel: Invalid argument reading file caps for /usr/bin/id
Jun  7 11:51:29 john97 kernel: Invalid argument reading file caps for /apps/lmod/lmod/lmod/libexec/addto
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the upshot of which is that those files then can&apos;t be exec&apos;d by the kernel.&lt;/p&gt;

&lt;p&gt;all our servers are now centos 7.4 and 2.10.4 + LU10988 lfsck patch, zfs 0.7.9.&lt;br/&gt;
we have 4 lustre filesystems in the cluster and this &apos;fail caps&apos; issue happens on them all. more on the root filesystem because there are more exe&apos;s there.&lt;/p&gt;

&lt;p&gt;for some files it seems to happen on all clients and be persistent eg. all the 2.10.4 client nodes see this&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john72 ~]# g++
-bash: /usr/bin/g++: Invalid argument
[root@john72 ~]# dmesg | tail -1
[616489.562465] Invalid argument reading file caps for /usr/bin/g++
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;and for other files it&apos;s transient. eg. the exe&apos;s on the nodes listed above all work again now&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@john97 ~]# /usr/bin/id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;g++ is interesting because it&apos;s hard-linked 4 times (to c+&lt;ins&gt;, ...), which might be part of why it persists? copying each of c&lt;/ins&gt;&lt;ins&gt;, g&lt;/ins&gt;+. etc. to a separate (non-hardlinked) file is a workaround and lets it be exec&apos;d again, but that doesn&apos;t explain all the other files that sometimes work and sometimes don&apos;t.&lt;/p&gt;

&lt;p&gt;apart from things like g++, the problem is rare, less than once per client per day.&lt;/p&gt;

&lt;p&gt;as a workaround (so we can get all clients onto the more secure centos7.5) we&apos;d like to run 2.10.3 on centos7.5 for a while, but it doesn&apos;t seem to work (looks to mount, but then ls says &apos;not a directory&apos;). I don&apos;t suppose there&apos;s a patch or two that&apos;ll let 2.10.3 be functional on centos7.5? thanks.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</description>
                <environment>centos 7.5, x86_64, OPA, zfs 0.7.9</environment>
        <key id="52524">LU-11074</key>
            <summary>Invalid argument reading file caps</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="jhammond">John Hammond</assignee>
                                    <reporter username="scadmin">SC Admin</reporter>
                        <labels>
                    </labels>
                <created>Thu, 7 Jun 2018 09:34:38 +0000</created>
                <updated>Fri, 3 Aug 2018 20:37:53 +0000</updated>
                            <resolved>Wed, 18 Jul 2018 12:49:49 +0000</resolved>
                                    <version>Lustre 2.10.4</version>
                                    <fixVersion>Lustre 2.12.0</fixVersion>
                    <fixVersion>Lustre 2.10.5</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="229299" author="jhammond" created="Thu, 7 Jun 2018 17:23:05 +0000"  >&lt;p&gt;Hi Robin,&lt;/p&gt;

&lt;p&gt;Are you using any Linux Security Modules? Could you enable full debugging, clear the debug log, reproduce this, dump the log and attach? (You may need to increase the debug_mb parameter to get a full capture.)&lt;/p&gt;</comment>
                            <comment id="229301" author="scadmin" created="Thu, 7 Jun 2018 18:37:59 +0000"  >&lt;p&gt;Hey John,&lt;/p&gt;

&lt;p&gt;no, not using any LSM.&lt;/p&gt;

&lt;p&gt;I&apos;ll gather the debug for eg. g++ when a node clears of jobs. otherwise there&apos;ll be lots of noise.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="229354" author="jhammond" created="Fri, 8 Jun 2018 17:09:06 +0000"  >&lt;p&gt;Which 7.5 kernel are you using?&lt;/p&gt;</comment>
                            <comment id="229355" author="pjones" created="Fri, 8 Jun 2018 17:09:25 +0000"  >&lt;p&gt;Robin&lt;/p&gt;

&lt;p&gt;Any idea how long it will take to get the debug logs?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="229361" author="scadmin" created="Fri, 8 Jun 2018 18:49:36 +0000"  >&lt;p&gt;we&apos;re using 862.3.2 kernel, the latest AFAIK.&lt;/p&gt;

&lt;p&gt;I&apos;m being hesitant about debug logs &apos;cos I&apos;m not 100% convinced it&apos;s a lustre bug. we definitely don&apos;t see this issue with rhel7.4 + 2.10.3, but the complication is that we use overlayfs over our root lustre filesystem.&lt;/p&gt;

&lt;p&gt;overlayfs changed a lot between 7.4 and 7.5 and I&apos;ve re-patched it etc, but it might still be an overlayfs bug, or an overlayfs interaction with lustre that&apos;s now different vs. a pure lustre bug.&lt;/p&gt;

&lt;p&gt;the thing that indicates it&apos;s maybe a real lustre issue is that we see the &apos;file caps&apos; problem on all filesystems - /home, /apps, /fred(dagg) - and not just on /images (which is the only one with overlayfs over it).&lt;/p&gt;

&lt;p&gt;AFAIK the only thing these 4 filesystems share is the root inode, which is on overlayfs. it seems really unlikely that the node is healthy for all accesses via the root inode/dentry, and at the same time sees &apos;file caps&apos; fail on one of the pure lustre filesystems, but I wanted to try a few things first. eg. patch the rhel 7.5 kernel with a bunch of stable capabilities namespace backports that rhel seem to have omitted... unfortunately that didn&apos;t fix it.&lt;/p&gt;

&lt;p&gt;the g++ &apos;file caps&apos; bug (the one that&apos;s trivial to reproduce) doesn&apos;t happen if I go directly to lustre, so there&apos;s definitely something wrong with overlayfs. I was sure I&apos;d tried this before making this bug report, but I guess not.&lt;/p&gt;

&lt;p&gt;however, g++ failing via overlayfs and working via lustre doesn&apos;t explain the much rarer fails direct to lustre on the other 3 filesystems (+/- that shared root inode). but I can&apos;t reproduce those at will - they are rare. so I don&apos;t see how I can get you a debug trace for those.&lt;/p&gt;

&lt;p&gt;I can&apos;t figure out from &apos;git log v2_10_3..v2_10_4&apos; on b2_10 which patch(es) make the lustre client work with rhel7.5&apos;s kernel. if there is one or two that you can point me at then that would help.&lt;br/&gt;
this is because if 2.10.3 is busted with rhel7.5 too, then that means it&apos;s a rhel7.5 kernel issue and nothing to do with lustre.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="229419" author="adilger" created="Mon, 11 Jun 2018 17:18:52 +0000"  >&lt;p&gt;If you can&apos;t find which patch is the source of the problem, I&apos;d suggest to use &lt;tt&gt;git bisect&lt;/tt&gt; with your &quot;good&quot;  reproducer (possibly run multiple times to ensure you don&apos;t get a false pass) to isolate the issue to a single patch. That will allow us to identify which patch introduced the problem and possibly see how it is interacting badly with overlayfs. &lt;/p&gt;</comment>
                            <comment id="229420" author="pjones" created="Mon, 11 Jun 2018 17:21:02 +0000"  >&lt;p&gt;Lai&lt;/p&gt;

&lt;p&gt;Can you please investigate?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="229422" author="pjones" created="Mon, 11 Jun 2018 17:27:43 +0000"  >&lt;p&gt;Sorry - Lai, I intended that comment for another ticket&lt;/p&gt;</comment>
                            <comment id="229424" author="scadmin" created="Mon, 11 Jun 2018 17:31:11 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;thanks for the activity on the bug, it is much appreciated. but unless you have a solid suspicion of what&apos;s wrong, then please don&apos;t work on this for now. &lt;/p&gt;

&lt;p&gt;I built 2.10.4 for centos7.4 on the weekend and have been rebooting clients into it since.&lt;/p&gt;

&lt;p&gt;hopefully I can work out from that if &apos;file caps&apos; is a lustre 2.10.4 issue or a rhel7.5 kernel + overlayfs issue.&lt;/p&gt;

&lt;p&gt;sorry, I should have thought of doing that before...&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="229755" author="scadmin" created="Wed, 27 Jun 2018 12:23:37 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I&apos;ve finally had some time to look into this again. seems there&apos;s a regression with Lustre on the rhel/centos 7.5 kernel.&lt;/p&gt;

&lt;p&gt;the rhel/centos 7.4 kernel is fine, but the 7.5 kernel breaks Lustre when getting file capabilities from files with lots of hard links.&lt;/p&gt;

&lt;p&gt;a reproducer is:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# echo blah &amp;gt; a
# getcap a
# for f in {b..f}; do ln a $f; done
# getcap a
Failed to get capabilities of file `a&apos; (Invalid argument)
# cat /sys/fs/lustre/version 
2.10.4
# uname -a
Linux john5 3.10.0-862.3.3.el7.x86_64 #1 SMP Fri Jun 15 04:15:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;our &apos;real world&apos; example is a g++ exe on Lustre with 4 hard links which always fails &apos;getcap&apos;, but the above reproducer (on a different Lustre fs with more MDTs) required more than 4 hard links to see the same problem.&lt;/p&gt;

&lt;p&gt;I went out to &amp;gt;200 hard links with the same example as above with Lustre 2.10.4 and centos 7.4 kernel, and it was fine.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="229772" author="scadmin" created="Thu, 28 Jun 2018 06:03:12 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;in case it wasn&apos;t clear, there&apos;s no overlayfs involved in the above reproducer at all - only Lustre. the node was booted into a server ramdisk image to do the testing.&lt;/p&gt;

&lt;p&gt;the reproducer is super-simple, but please let me know if you want me to gather debug logs from eg. 7.4 kernel + 2.10.4 and 7.5 kernel + 2.10.4 anyway. not hard for me to do.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="229795" author="jhammond" created="Thu, 28 Jun 2018 18:51:03 +0000"  >&lt;p&gt;Hi Robin,&lt;/p&gt;

&lt;p&gt;OK, thank you for your reproducer. It&apos;s reproducing the issue for me as well. There appear to a few bugs here. I have a fix for one of them at &lt;a href=&quot;https://review.whamcloud.com/32739&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32739&lt;/a&gt;. I believe this change will give you a workaround for the file caps issue. I am testing it locally now as well as looking at fixes for the other bugs.&lt;/p&gt;</comment>
                            <comment id="229814" author="scadmin" created="Fri, 29 Jun 2018 13:59:42 +0000"  >&lt;p&gt;Hi John,&lt;/p&gt;

&lt;p&gt;yeah, that seems to work for g++ with 862.3.3 kernel. thanks. nicely done &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;I&apos;ll roll it out onto a few nodes and keep and eye on them and see if it&apos;s also fixed the sporadic &apos;file caps&apos; failures we were seeing.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="229827" author="scadmin" created="Sat, 30 Jun 2018 09:10:01 +0000"  >&lt;p&gt;Hi John,&lt;/p&gt;

&lt;p&gt;after booting a few nodes into this, I&apos;m still seeing the occasional &apos;file caps&apos; failure so yeah, you&apos;re right - there&apos;s more bugs in this area somewhere.&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="229855" author="jhammond" created="Mon, 2 Jul 2018 13:12:42 +0000"  >&lt;p&gt;Yes, I believe that &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11107&quot; title=&quot;getxattr() returns 0 length values for nonexistent xattrs (with xattr_cache=0)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11107&quot;&gt;&lt;del&gt;LU-11107&lt;/del&gt;&lt;/a&gt; is the real issue. &lt;a href=&quot;https://review.whamcloud.com/32739&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32739&lt;/a&gt; should just reduce your chances of hitting it.&lt;/p&gt;</comment>
                            <comment id="230443" author="gerrit" created="Wed, 18 Jul 2018 06:01:38 +0000"  >&lt;p&gt;Oleg Drokin (green@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32739/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32739/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11074&quot; title=&quot;Invalid argument reading file caps&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11074&quot;&gt;&lt;del&gt;LU-11074&lt;/del&gt;&lt;/a&gt; mdc: set correct body eadatasize for getxattr()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: dea1cde92014545d97406bf8adba20840abdb1a9&lt;/p&gt;</comment>
                            <comment id="230474" author="pjones" created="Wed, 18 Jul 2018 12:49:49 +0000"  >&lt;p&gt;Landed for 2.12&lt;/p&gt;</comment>
                            <comment id="231072" author="scadmin" created="Mon, 30 Jul 2018 16:07:57 +0000"  >&lt;p&gt;just to follow up, this and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11107&quot; title=&quot;getxattr() returns 0 length values for nonexistent xattrs (with xattr_cache=0)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11107&quot;&gt;&lt;del&gt;LU-11107&lt;/del&gt;&lt;/a&gt; have fixed the issue for us.&lt;br/&gt;
thanks!&lt;/p&gt;

&lt;p&gt;cheers,&lt;br/&gt;
robin&lt;/p&gt;</comment>
                            <comment id="231075" author="gerrit" created="Mon, 30 Jul 2018 16:22:53 +0000"  >&lt;p&gt;Minh Diep (mdiep@whamcloud.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/32901&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32901&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11074&quot; title=&quot;Invalid argument reading file caps&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11074&quot;&gt;&lt;del&gt;LU-11074&lt;/del&gt;&lt;/a&gt; mdc: set correct body eadatasize for getxattr()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: c8bf7d0fb95618a06a493228707cd1e830da78f8&lt;/p&gt;</comment>
                            <comment id="231425" author="gerrit" created="Fri, 3 Aug 2018 20:07:38 +0000"  >&lt;p&gt;John L. Hammond (jhammond@whamcloud.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/32901/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/32901/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-11074&quot; title=&quot;Invalid argument reading file caps&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-11074&quot;&gt;&lt;del&gt;LU-11074&lt;/del&gt;&lt;/a&gt; mdc: set correct body eadatasize for getxattr()&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_10&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f99f9345e46b5b19a8dca2aae4d348c99d8e2481&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="52612">LU-11107</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="52647">LU-11123</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzy5z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>