<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:23:15 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-2205] recovery-small test 11 multiop eternal loop</title>
                <link>https://jira.whamcloud.com/browse/LU-2205</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;I started to hit tihs issue frequently on repeated runs of recovery-small.&lt;/p&gt;

&lt;p&gt;Multiop starts to loop outputting the samme message over and over again:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;short read: 0/1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I logged into the node and here&apos;s what I see&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@centos6-7 ~]# dmesg | tail
[27409.796429] LustreError: 138-a: lustre-OST0001: A client on nid 0@lo was evicted due to a lock blocking callback time out: rc -107
[27409.798509] LustreError: Skipped 2 previous similar messages
[27409.821171] Lustre: DEBUG MARKER: cancel_lru_locks osc start
[27410.021536] LustreError: 26667:0:(ldlm_lockd.c:2177:ldlm_cancel_handler()) ldlm_cancel from 0@lo arrived at 1350470383 with bad export cookie 1895698488629392860
[27410.022477] LustreError: 26666:0:(ldlm_request.c:1169:ldlm_cli_cancel_req()) Got rc -107 from cancel RPC: canceling anyway
[27410.022985] LustreError: 167-0: lustre-OST0000-osc-ffff8800226a5bf0: This client was evicted by lustre-OST0000; in progress operations using this service will fail.
[27410.023160] LustreError: 27694:0:(ldlm_resource.c:761:ldlm_resource_complain()) Namespace lustre-OST0000-osc-ffff8800226a5bf0 resource refcount nonzero (1) after lock cleanup; forcing cleanup.
[27410.023163] LustreError: 27694:0:(ldlm_resource.c:767:ldlm_resource_complain()) Resource: ffff88001ab28e78 (2/0/0/0) (rc: 1)
[27410.025030] LustreError: 26666:0:(ldlm_request.c:1795:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -107
[27410.092031] Lustre: DEBUG MARKER: cancel_lru_locks osc stop
[root@centos6-7 ~]# ps ax | grep multi
27707 pts/0    R+   291:38 multiop /mnt/lustre/f.recovery-small.11 or
27878 pts/1    S+     0:00 grep multi
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;strace on this process shows&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;read(3, &quot;&quot;, 1)                          = 0
write(2, &quot;short read: 0/1\n&quot;, 16)       = 16
read(3, &quot;&quot;, 1)                          = 0
write(2, &quot;short read: 0/1\n&quot;, 16)       = 16
read(3, &quot;&quot;, 1)                          = 0
write(2, &quot;short read: 0/1\n&quot;, 16)       = 16
read(3, &quot;&quot;, 1)                          = 0
write(2, &quot;short read: 0/1\n&quot;, 16)       = 16
read(3, &quot;&quot;, 1)                          = 0
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;So I did another experiment:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;[root@centos6-7 ~]# ls -la /proc/27707/fd
total 0
dr-x------ 2 root root  0 Oct 17 10:56 .
dr-xr-xr-x 7 root root  0 Oct 17 10:56 ..
lrwx------ 1 root root 64 Oct 17 11:37 0 -&amp;gt; /dev/pts/0
l-wx------ 1 root root 64 Oct 17 11:37 1 -&amp;gt; pipe:[1489661]
l-wx------ 1 root root 64 Oct 17 10:56 2 -&amp;gt; pipe:[1489624]
lr-x------ 1 root root 64 Oct 17 11:37 3 -&amp;gt; /mnt/lustre/f.recovery-small.11
[root@centos6-7 ~]# ls -l /mnt/lustre/f.recovery-small.11
-rw-r--r-- 1 root root 0 Oct 17 06:39 /mnt/lustre/f.recovery-small.11
[root@centos6-7 ~]# cat /mnt/lustre/f.recovery-small.11
cat: /mnt/lustre/f.recovery-small.11: Input/output error
[root@centos6-7 ~]# touch /mnt/lustre/f.recovery-small.11
[root@centos6-7 ~]# echo $?
0
[root@centos6-7 ~]# cat /mnt/lustre/f.recovery-small.11
[root@centos6-7 ~]# 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;yet at this point multiop still did not finish its loop.&lt;/p&gt;

&lt;p&gt;At the very least I imagine we should actually make multiop terminate on error that repeats many times.&lt;/p&gt;</description>
                <environment></environment>
        <key id="16391">LU-2205</key>
            <summary>recovery-small test 11 multiop eternal loop</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="green">Oleg Drokin</reporter>
                        <labels>
                    </labels>
                <created>Wed, 17 Oct 2012 11:47:13 +0000</created>
                <updated>Fri, 19 Apr 2013 20:37:32 +0000</updated>
                            <resolved>Sat, 3 Nov 2012 01:38:56 +0000</resolved>
                                    <version>Lustre 2.4.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="46703" author="adilger" created="Wed, 17 Oct 2012 23:44:48 +0000"  >&lt;p&gt;This is fixed with my patch in &lt;a href=&quot;http://review.whamcloud.com/4265&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4265&lt;/a&gt; &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-1538&quot; title=&quot;cleanup test scripts&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-1538&quot;&gt;&lt;del&gt;LU-1538&lt;/del&gt;&lt;/a&gt; tests: fix test cases when OST is full&quot;.  The multiop script should return an error if it cannot read the requested number of bytes, otherwise there is no way for the caller to know something went wrong.&lt;/p&gt;</comment>
                            <comment id="46706" author="green" created="Thu, 18 Oct 2012 02:29:04 +0000"  >&lt;p&gt;Well, that just papers over the symptoms, is not it? I get I/O error reading this apparently empty file (with cat) which cannot be right.&lt;/p&gt;</comment>
                            <comment id="46707" author="green" created="Thu, 18 Oct 2012 02:58:40 +0000"  >&lt;p&gt;btw only first cat gives i/o error, subsequent cats don&apos;t&lt;/p&gt;</comment>
                            <comment id="46714" author="adilger" created="Thu, 18 Oct 2012 04:14:25 +0000"  >&lt;p&gt;It doesn&apos;t paper over the symptoms at all.  Instead of multiop being stuck in a loop forever, it returns an error to the caller, so the test can fail in some meaningful manner.&lt;/p&gt;</comment>
                            <comment id="46980" author="green" created="Sat, 27 Oct 2012 01:05:47 +0000"  >&lt;p&gt;well, I guess so.&lt;br/&gt;
Now test 11 fails 100% of the time for me though I believe it passed at least some percent of the time before.&lt;/p&gt;</comment>
                            <comment id="46981" author="green" created="Sat, 27 Oct 2012 01:11:40 +0000"  >&lt;p&gt;hm, spoke too soon, it does pass from time to time, but fails much more frequently apparently.&lt;/p&gt;</comment>
                            <comment id="47356" author="green" created="Sat, 3 Nov 2012 01:38:56 +0000"  >&lt;p&gt;Fixed with change 4265 landed&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="15169">LU-1612</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvakn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5248</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>