<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:30:55 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3094] Sanity test_132: @@@@@@ FAIL: some glimpse RPC is expected and silent errors during mount</title>
                <link>https://jira.whamcloud.com/browse/LU-3094</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Lustre Errors are reported in Sanity Test_132. &lt;/p&gt;

&lt;p&gt;I believe they are related to later errors that cannot read some /proc stats from the OST. &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2902&quot; title=&quot;sanity test_156: NOT IN CACHE: before: , after: &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2902&quot;&gt;&lt;del&gt;LU-2902&lt;/del&gt;&lt;/a&gt; (Sanity test 156) and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2735&quot; title=&quot;sanity.sh test_151: NOT IN CACHE: before: 337, after: 337 &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2735&quot;&gt;&lt;del&gt;LU-2735&lt;/del&gt;&lt;/a&gt; (Sanity test 151) and example of the later issues. &lt;/p&gt;

&lt;p&gt;In general Test_132 remounts the OST and sometimes all it not well during the mount.  &lt;/p&gt;

&lt;p&gt;Whenever I find missing data from the /proc filesystem I see Lustre Errors during test 132. The test does not allays fail 132 but there allways errors during the remount. &lt;/p&gt;

&lt;p&gt;I find the 132 issues by searching for Sanity 156/151 and looking for FAIL with &quot; 	NOT IN CACHE: before: , after: &quot; (the Null /proc reads)&lt;/p&gt;

&lt;p&gt;So what errors are seen?&lt;/p&gt;

&lt;p&gt;In the Sanity test 132 OST dmesg you see things like:&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: DEBUG MARKER: mkdir -p /mnt/ost3; mount -t lustre   		                   /dev/lvm-OSS/P3 /mnt/ost3
LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: DEBUG MARKER: PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/u
Lustre: DEBUG MARKER: e2label /dev/lvm-OSS/P3 2&amp;gt;/dev/null
Lustre: lustre-OST0001: deleting orphan objects from 0x0:6257 to 6304
LustreError: 8844:0:(ldlm_resource.c:1161:ldlm_resource_get()) lvbo_init failed for resource 6272: rc -2
LustreError: 8844:0:(ldlm_resource.c:1161:ldlm_resource_get()) Skipped 244 previous similar messages
Lustre: DEBUG MARKER: mkdir -p /mnt/ost4
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;


&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: DEBUG MARKER: test -b /dev/lvm-OSS/P5
Lustre: lustre-OST0001: deleting orphan objects from 0x0:6257 to 6304
Lustre: Skipped 2 previous similar messages
LustreError: 10424:0:(ldlm_resource.c:1161:ldlm_resource_get()) lvbo_init failed for resource 5376: rc -2
LustreError: 10424:0:(ldlm_resource.c:1161:ldlm_resource_get()) Skipped 22 previous similar messages
Lustre: DEBUG MARKER: mkdir -p /mnt/ost5; mount -t lustre   		                   /dev/lvm-OSS/P5 /mnt/ost5
LDISKFS-fs (dm-4): mounted filesystem with ordered data mode. quota=on. Opts: 
LustreError: 137-5: UUID &apos;lustre-OST0005_UUID&apos; is not available for connect (no target)
LustreError: 137-5: UUID &apos;lustre-OST0006_UUID&apos; is not available for connect (no target)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;The Above strikes me a quite serious and it complains about finding the underlying storage target.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: DEBUG MARKER: mkdir -p /mnt/ost3; mount -t lustre   		                   /dev/lvm-OSS/P3 /mnt/ost3
LDISKFS-fs (dm-2): mounted filesystem with ordered data mode. quota=on. Opts: 
Lustre: 5288:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1364764845/real 1364764845]  req@ffff88030461c800 x1431056565766172/t0(0) o38-&amp;gt;lustre-MDT0000-lwp-OST0001@192.168.4.20@o2ib:12/10 lens 400/544 e 0 to 1 dl 1364764855 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
Lustre: 5288:0:(client.c:1868:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
LustreError: 11293:0:(client.c:1052:ptlrpc_import_delay_req()) @@@ send limit expired   req@ffff880328c36000 x1431056565766204/t0(0) o101-&amp;gt;MGC192.168.4.20@o2ib@192.168.4.20@o2ib:26/25 lens 328/344 e 0 to 0 dl 0 ref 2 fl Rpc:W/0/ffffffff rc 0/-1
LustreError: 11293:0:(client.c:1052:ptlrpc_import_delay_req()) Skipped 3 previous similar messages
LustreError: 12040:0:(obd_mount.c:1715:server_register_target()) Cannot talk to the MGS: -5, not fatal
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;In general the remount caused by 132 is quite messy in dmesg but it does not always seem to be the same errors.  There seem to be communication issues.  When 132 has these Lustre errors several other tests will have issues. &lt;/p&gt;



&lt;p&gt;Most of the time when the mount has errors 132 does not fail but sanity tests 133, 151, and 156 all fail together.&lt;/p&gt;

&lt;p&gt;I opened this as a blocker as I think this may be a root cause for several issues and deserves a proper evaluation.&lt;/p&gt;

&lt;p&gt;The runs below are recent example of the issues and there are plenty more if you search maloo:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/786cf564-9b82-11e2-bd87-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/786cf564-9b82-11e2-bd87-52540035b04c&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/636517d2-98f3-11e2-af89-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/636517d2-98f3-11e2-af89-52540035b04c&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/016cbfcc-9816-11e2-879d-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/016cbfcc-9816-11e2-879d-52540035b04c&lt;/a&gt;&lt;/p&gt;</description>
                <environment>The review queue in Maloo </environment>
        <key id="18209">LU-3094</key>
            <summary>Sanity test_132: @@@@@@ FAIL: some glimpse RPC is expected and silent errors during mount</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="niu">Niu Yawei</assignee>
                                    <reporter username="keith">Keith Mannthey</reporter>
                        <labels>
                    </labels>
                <created>Wed, 3 Apr 2013 04:16:43 +0000</created>
                <updated>Tue, 27 Aug 2013 02:54:08 +0000</updated>
                            <resolved>Tue, 27 Aug 2013 02:54:08 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="55358" author="keith" created="Wed, 3 Apr 2013 06:25:45 +0000"  >&lt;p&gt;I am running a 132 133 151 156 test loop on my local vms for overnight testing. &lt;/p&gt;</comment>
                            <comment id="55404" author="green" created="Wed, 3 Apr 2013 17:24:07 +0000"  >&lt;p&gt;So, the objects being deleted during orphan removal seems to be accessed right afterwards (lvbo_init failures) which should not be possible for real orphans. Did we just delete referenced objects?&lt;br/&gt;
This is potentially related to woes in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2657&quot; title=&quot;Shouldn&amp;#39;t deleting objects in mds_lov_update_objids()&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2657&quot;&gt;&lt;del&gt;LU-2657&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="55405" author="keith" created="Wed, 3 Apr 2013 17:29:20 +0000"  >&lt;p&gt;In my local testing I see most of these error occurring without the loss of the /proc entries. &lt;/p&gt;

&lt;p&gt;Perhaps the errors are not &quot;ERRORS&quot;. &lt;/p&gt;</comment>
                            <comment id="55485" author="pjones" created="Thu, 4 Apr 2013 15:21:55 +0000"  >&lt;p&gt;Niu&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="55689" author="niu" created="Mon, 8 Apr 2013 02:51:00 +0000"  >&lt;p&gt;Those lvbo_init failed with -ENOENT errors should come from the orphan cleanup, when OST trying to cleanup orphan, it founds those objects are not existing. I didn&apos;t see why the orphans have been cleared, but it should not cause the test_132 or subsequent 151/156 failure.&lt;/p&gt;</comment>
                            <comment id="55779" author="keith" created="Mon, 8 Apr 2013 18:57:59 +0000"  >&lt;p&gt;Yes it is not clear the errors seen in this test are the root cause of the failing of the /proc entries. &lt;/p&gt;

&lt;p&gt;I can get errors in 132 without the later errors so there is not a direct correlation.&lt;/p&gt;</comment>
                            <comment id="59676" author="keith" created="Thu, 30 May 2013 18:27:24 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2979&quot; title=&quot;sanity 133a: proc counter for mkdir on mds1 was not incremented&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2979&quot;&gt;&lt;del&gt;LU-2979&lt;/del&gt;&lt;/a&gt; was the root cause of the real issue that I was hunting. &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="17754">LU-2902</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvmy7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7514</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>