<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:06:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-7118] sanity-scrub: No sub tests failed in this test set</title>
                <link>https://jira.whamcloud.com/browse/LU-7118</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for Bob Glossman &amp;lt;bob.glossman@intel.com&amp;gt;&lt;/p&gt;

&lt;p&gt;I&apos;ve seen a lot of sanity-scrub instances entirely fail lately.  Looks like some kind of TEI issue to me as it shows up in test runs on lots of different and unrelated mods.  no logs are collected, summary always says:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Failed subtests

No sub tests failed in this test set.

All subtests

This test set does not have any sub tests.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Maybe something really bad landed that blocks any sanity-scrub from running.&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/aaf64806-5682-11e5-a9bc-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/aaf64806-5682-11e5-a9bc-5254006e85c2&lt;/a&gt;.&lt;/p&gt;</description>
                <environment></environment>
        <key id="31995">LU-7118</key>
            <summary>sanity-scrub: No sub tests failed in this test set</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="1" iconUrl="https://jira.whamcloud.com/images/icons/priorities/blocker.svg">Blocker</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="ys">Yang Sheng</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                    </labels>
                <created>Tue, 8 Sep 2015 23:48:18 +0000</created>
                <updated>Tue, 22 Sep 2015 06:15:40 +0000</updated>
                            <resolved>Tue, 22 Sep 2015 04:21:34 +0000</resolved>
                                    <version>Lustre 2.8.0</version>
                                    <fixVersion>Lustre 2.8.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>13</watches>
                                                                            <comments>
                            <comment id="126745" author="adilger" created="Wed, 9 Sep 2015 02:53:22 +0000"  >&lt;p&gt;This can often happen if there was a problem during cleanup/unmount/module unload after all the subtests have finished.&lt;/p&gt;</comment>
                            <comment id="126796" author="jamesanunez" created="Wed, 9 Sep 2015 16:07:49 +0000"  >&lt;p&gt;In the suite_stdout logs from the failure for this ticket:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;23:31:38:shadow-40vm8: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P1 at /mnt/ost1 failed: Cannot send after transport endpoint shutdown
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;There are no other logs to see what the actual problem with the OSS might be. It looks like the previous test suite, sanity-quota, completed with no errors. &lt;/p&gt;

&lt;p&gt;This failure matches the one in &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5645&quot; title=&quot;Fail to start sanity-scrub&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5645&quot;&gt;&lt;del&gt;LU-5645&lt;/del&gt;&lt;/a&gt;. So, this is probably a duplicate of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5645&quot; title=&quot;Fail to start sanity-scrub&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5645&quot;&gt;&lt;del&gt;LU-5645&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are many instances of sanity-scrub failing, here are a few instances on review-dne-part-2:&lt;br/&gt;
2015-09-09 03:06:45 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/5bd4b8c2-56cb-11e5-84d0-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/5bd4b8c2-56cb-11e5-84d0-5254006e85c2&lt;/a&gt;&lt;br/&gt;
2015-09-09 07:25:42 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/9d6af5e0-56ed-11e5-84d0-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/9d6af5e0-56ed-11e5-84d0-5254006e85c2&lt;/a&gt;&lt;br/&gt;
2015-09-09 09:13:53 - &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/40ac0b3a-56fe-11e5-8947-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/40ac0b3a-56fe-11e5-8947-5254006e85c2&lt;/a&gt;&lt;/p&gt;
</comment>
                            <comment id="126827" author="bogl" created="Wed, 9 Sep 2015 20:52:46 +0000"  >&lt;p&gt;another on master:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/78b06ad8-5732-11e5-a2e1-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/78b06ad8-5732-11e5-a2e1-5254006e85c2&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="126829" author="sarah" created="Wed, 9 Sep 2015 20:57:15 +0000"  >&lt;p&gt;I have seen similar issue on tip of master for RHEL7.1 server. Not sure if they are the same. &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7123&quot; title=&quot;sanity-scrub: OST shows unable to mount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7123&quot;&gt;&lt;del&gt;LU-7123&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="126831" author="bogl" created="Wed, 9 Sep 2015 21:05:30 +0000"  >&lt;p&gt;Sarah,  I think your &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7123&quot; title=&quot;sanity-scrub: OST shows unable to mount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7123&quot;&gt;&lt;del&gt;LU-7123&lt;/del&gt;&lt;/a&gt; may very well be the same thing and maybe our only clue to what&apos;s going on.  As I understand it by running on el7 the console log gets timestamps prepended to all lines.  This in turn causes maloo to misplace the console logs onto lustre-init instead of putting them where they should be.   I&apos;m guessing the same info may be in console logs on el6 runs but those aren&apos;t captured or are thrown away.&lt;/p&gt;

&lt;p&gt;Having the console logs misplaced may be a good thing in this case.  It preserved them and lets us see info not captured during failures on el6.&lt;/p&gt;</comment>
                            <comment id="126951" author="jgmitter" created="Thu, 10 Sep 2015 17:40:42 +0000"  >&lt;p&gt;Can you please have a look at this issue?  Can you dig into it further to get more info and see what may be happening?&lt;br/&gt;
Thanks&lt;br/&gt;
Joe&lt;/p&gt;</comment>
                            <comment id="126953" author="adilger" created="Thu, 10 Sep 2015 17:43:38 +0000"  >&lt;p&gt;There are no console logs available to show what is happening. The MDT and OST were just reformatted and mounted so it is strange that there is an error. &lt;/p&gt;</comment>
                            <comment id="126968" author="yujian" created="Thu, 10 Sep 2015 18:21:57 +0000"  >&lt;p&gt;The first occurrence of this failure is in report &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/aaf64806-5682-11e5-a9bc-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/aaf64806-5682-11e5-a9bc-5254006e85c2&lt;/a&gt; of patch &lt;a href=&quot;http://review.whamcloud.com/16315&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16315&lt;/a&gt;, which is based on 01ca899324738343279c1d63823b7fab937197dc (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7079&quot; title=&quot;OSP shouldn&amp;#39;t discard requests due to imp_peer_committed_transno&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7079&quot;&gt;&lt;del&gt;LU-7079&lt;/del&gt;&lt;/a&gt; ptlrpc: imp_peer_committed_transno should increase), and which was landed on 6 days ago. So, it seems this is not a regression introduced by the recent landings.&lt;/p&gt;</comment>
                            <comment id="126971" author="pjones" created="Thu, 10 Sep 2015 18:27:10 +0000"  >&lt;p&gt;Yang Sheng&lt;/p&gt;

&lt;p&gt;Could you please investigate?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="126974" author="yujian" created="Thu, 10 Sep 2015 18:42:35 +0000"  >&lt;p&gt;This report &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/5bd4b8c2-56cb-11e5-84d0-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/5bd4b8c2-56cb-11e5-84d0-5254006e85c2&lt;/a&gt; (failed on 2015-09-09) of &lt;a href=&quot;http://review.whamcloud.com/16129&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16129&lt;/a&gt; patch set 4 is based on commit 1f4d68d334c85a8106f5939351991b80449e5713 (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-6947&quot; title=&quot;Stray comment in ptlrpc_start_pinger&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-6947&quot;&gt;&lt;del&gt;LU-6947&lt;/del&gt;&lt;/a&gt; ptlrpc: Remove stray comment in ptlrpc_start_pinger), which was landed on 2015-08-26. So, it&apos;s not related to the recent Lustre landings.&lt;/p&gt;

&lt;p&gt;Is the issue related to the recent RHEL 6.7 change in autotest system?&lt;/p&gt;</comment>
                            <comment id="126976" author="sarah" created="Thu, 10 Sep 2015 18:58:27 +0000"  >&lt;p&gt;If this is a dup of &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7123&quot; title=&quot;sanity-scrub: OST shows unable to mount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7123&quot;&gt;&lt;del&gt;LU-7123&lt;/del&gt;&lt;/a&gt;, then there can find MDS and OSS console logs.&lt;/p&gt;</comment>
                            <comment id="126977" author="bogl" created="Thu, 10 Sep 2015 19:08:02 +0000"  >&lt;p&gt;the misplaced console logs in the &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7123&quot; title=&quot;sanity-scrub: OST shows unable to mount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7123&quot;&gt;&lt;del&gt;LU-7123&lt;/del&gt;&lt;/a&gt; report are no help I think.  they are there in lustre-init, but only show sanity-scrub taking 0 time.  for example in the mds console log:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;17:23:00:[ 9529.925038] Lustre: DEBUG MARKER: /usr/sbin/lctl mark -----============= acceptance-small: sanity-scrub ============----- Fri Sep  4 17:22:53 UTC 2015
17:23:23:[ 9530.121771] Lustre: DEBUG MARKER: -----============= acceptance-small: sanity-scrub ============----- Fri Sep 4 17:22:53 UTC 2015
17:23:23:[ 9530.645738] Lustre: DEBUG MARKER: /usr/sbin/lctl mark == sanity-scrub test complete, duration -o sec ======================================================= 17:22:53 \(1441387373\)
17:23:23:[ 9530.809134] Lustre: DEBUG MARKER: == sanity-scrub test complete, duration -o sec ======================================================= 17:22:53 (1441387373)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;nothing useful there that I can see.&lt;/p&gt;</comment>
                            <comment id="126993" author="di.wang" created="Thu, 10 Sep 2015 21:48:08 +0000"  >&lt;p&gt;Hmm, I though the OSS console logs on &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7123&quot; title=&quot;sanity-scrub: OST shows unable to mount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7123&quot;&gt;&lt;del&gt;LU-7123&lt;/del&gt;&lt;/a&gt; is the key here&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;7:23:31:[ 9568.305881] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: errors=remount-ro
17:23:31:[ 9568.461183] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache
17:23:31:[ 9568.480543] Lustre: Evicted from MGS (at 10.1.4.162@tcp) after server handle changed from 0xd1a6ba6be92fc62 to 0xd1a6ba6be9adf3a
17:23:31:[ 9568.482736] LustreError: 15f-b: lustre-OST0000: cannot register this server with the MGS: rc = -108. Is the MGS running?
17:23:31:[ 9568.482991] Lustre: MGC10.1.4.162@tcp: Connection restored to MGS (at 10.1.4.162@tcp)
17:23:31:[ 9568.482993] Lustre: Skipped 6 previous similar messages
17:23:31:[ 9568.485396] LustreError: 14130:0:(obd_mount_server.c:1794:server_fill_super()) Unable to start targets: -108
17:23:31:[ 9568.486579] LustreError: 14130:0:(obd_mount_server.c:1509:server_put_super()) no obd lustre-OST0000
17:23:31:[ 9568.487556] LustreError: 14130:0:(obd_mount_server.c:137:server_deregister_mount()) lustre-OST0000 not registered
17:23:31:[ 9568.620431] Lustre: server umount lustre-OST0000 complete
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It looks like MGC somehow survive umount/remount OST process. Since MDT0 is reformatted, so this survive mgc caused the mount failed. (or there might be some race here, sigh there are no further log here). &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;17:23:31:[ 9568.480543] Lustre: Evicted from MGS (at 10.1.4.162@tcp) after server handle changed from 0xd1a6ba6be92fc62 to 0xd1a6ba6be9adf3a
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Sigh, nothing on MDS console log.&lt;/p&gt;</comment>
                            <comment id="126996" author="di.wang" created="Thu, 10 Sep 2015 21:53:56 +0000"  >&lt;p&gt;See mgc even start to reconnect before the first OST is mount, if the timestamp is correct in the console log( is it? ), that means MGC is left over from last umount, a bit strange.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;17:23:31:[ 9567.529080] Lustre: 19295:0:(client.c:2039:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1441387392/real 1441387392]  req@ffff88005deb1e00 x1511395716989360/t0(0) o250-&amp;gt;MGC10.1.4.162@tcp@10.1.4.162@tcp:26/25 lens 520/544 e 0 to 1 dl 1441387403 ref 1 fl Rpc:XN/0/ffffffff rc 0/-1
17:23:31:[ 9567.531521] Lustre: 19295:0:(client.c:2039:ptlrpc_expire_one_request()) Skipped 3 previous similar messages
17:23:31:[ 9567.570845] Lustre: DEBUG MARKER: test -b /dev/lvm-Role_OSS/P1
17:23:31:[ 9567.913313] Lustre: DEBUG MARKER: mkdir -p /mnt/ost1; mount -t lustre   		                   /dev/lvm-Role_OSS/P1 /mnt/ost1
17:23:31:[ 9568.305881] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: errors=remount-ro
17:23:31:[ 9568.461183] LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Opts: errors=remount-ro,no_mbcache
17:23:31:[ 9568.480543] Lustre: Evicted from MGS (at 10.1.4.162@tcp) after server handle changed from 0xd1a6ba6be92fc62 to 0xd1a6ba6be9adf3a
17:23:31:[ 9568.482736] LustreError: 15f-b: lustre-OST0000: cannot register this server with the MGS: rc = -108. Is the MGS running?
17:23:31:[ 9568.482991] Lustre: MGC10.1.4.162@tcp: Connection restored to MGS (at 10.1.4.162@tcp)
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="126997" author="di.wang" created="Thu, 10 Sep 2015 22:03:58 +0000"  >&lt;p&gt;Because it blocks almost all of patches on master, I changed it to blocker.&lt;/p&gt;</comment>
                            <comment id="126998" author="di.wang" created="Thu, 10 Sep 2015 22:17:45 +0000"  >&lt;p&gt;Ah, there are 7 OSTs in the test config. But it seems sanity-scrub.sh only stops 4 OSTs, then do reformat &amp;amp; restart, that is why mgc is left over and cause all these troubles. &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# no need too much OSTs, to reduce the format/start/stop overhead
[ $OSTCOUNT -gt 4 ] &amp;amp;&amp;amp; OSTCOUNT=4
                
MOUNT_2=&quot;&quot;              
                        
# build up a clean test environment.
formatall      
setupall
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Probably we only need fix the test script here.&lt;/p&gt;</comment>
                            <comment id="126999" author="gerrit" created="Thu, 10 Sep 2015 22:22:43 +0000"  >&lt;p&gt;wangdi (di.wang@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/16366&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16366&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7118&quot; title=&quot;sanity-scrub: No sub tests failed in this test set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7118&quot;&gt;&lt;del&gt;LU-7118&lt;/del&gt;&lt;/a&gt; tests: stop all OSTs before reformat&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 802a0e14359af52a7265f9a0441c1704517d8b94&lt;/p&gt;</comment>
                            <comment id="127000" author="bogl" created="Thu, 10 Sep 2015 22:26:09 +0000"  >&lt;p&gt;I like your theory, but it looks like that line reducing OSTCOUNT to 4 has been in there for over a year.  Why only a problem recently?&lt;/p&gt;</comment>
                            <comment id="127003" author="adilger" created="Thu, 10 Sep 2015 22:42:25 +0000"  >&lt;p&gt;I was going to ask the same question...  I don&apos;t see any changes to sanity-scrub.sh or test-framework.sh recently.&lt;/p&gt;</comment>
                            <comment id="127006" author="di.wang" created="Thu, 10 Sep 2015 23:05:47 +0000"  >&lt;p&gt;Well, this only cause problem when remount OST (after reformat) happens after the leftover MGC is being evicted, otherwise we can not see this problem. (Hmm, it seems we need do extra check for MGC, or do we allow OST to use old MGC in this case, it seems not cause any problem if it is not being evicted). &lt;/p&gt;

&lt;p&gt;I guess the recent change  just prolong the time costs of reformat &amp;amp; restart?  Except the lustre change, do we change test environment recently?&lt;/p&gt;

</comment>
                            <comment id="127115" author="ys" created="Fri, 11 Sep 2015 18:07:06 +0000"  >&lt;p&gt;I think we need someone from TEL team to investigate why test logs absence and why sanity-scrub doesn&apos;t run at all just return immediately. I have verified the test script runs well on shadow cluster. Do we have any documents about autotest system how to work? &lt;/p&gt;</comment>
                            <comment id="127122" author="mdiep" created="Fri, 11 Sep 2015 18:35:52 +0000"  >&lt;p&gt;This is what I see.&lt;br/&gt;
The reason that there aren&apos;t any logs because the first subtest hasn&apos;t even started yet. it failed during setup and mount OST1&lt;/p&gt;

&lt;p&gt;Starting ost1:   /dev/lvm-Role_OSS/P1 /mnt/ost1&lt;br/&gt;
CMD: onyx-35vm4 mkdir -p /mnt/ost1; mount -t lustre   		                   /dev/lvm-Role_OSS/P1 /mnt/ost1&lt;br/&gt;
onyx-35vm4: mount.lustre: mount /dev/mapper/lvm--Role_OSS-P1 at /mnt/ost1 failed: Cannot send after transport endpoint shutdown.&lt;/p&gt;

&lt;p&gt;The question here is why we can&apos;t mount the OST. why the ost disconnect due to transport endpoint shutdown?&lt;br/&gt;
sanity-scrub does formatall and setupall at the beginning of the test. Perhaps the cleanup was not properly done.&lt;/p&gt;</comment>
                            <comment id="127155" author="gerrit" created="Fri, 11 Sep 2015 23:15:36 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/16366/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16366/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7118&quot; title=&quot;sanity-scrub: No sub tests failed in this test set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7118&quot;&gt;&lt;del&gt;LU-7118&lt;/del&gt;&lt;/a&gt; tests: stop all OSTs before reformat&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 61b4d4ff8e9a6e7539aa3a7dcc4dd1aea6b4f927&lt;/p&gt;</comment>
                            <comment id="127157" author="adilger" created="Fri, 11 Sep 2015 23:30:31 +0000"  >&lt;p&gt;This definitely seems like a regression that landed recently on master, since sanity-scrub has had the &lt;tt&gt;[ $OSTCOUNT -gt 4 ] &amp;amp;&amp;amp; OSTCOUNT=4&lt;/tt&gt; line since commit &lt;tt&gt;v2_5_58_0-41-g1dbba32&lt;/tt&gt; and there are not any failures on b2_7 testing.  It might be useful to run a series of tests with different commits going back once per day running review-ldiskfs multiple times (if this is possible):&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Test-Parameters: fortestonly testgroup=review-ldiskfs
Test-Parameters: fortestonly testgroup=review-ldiskfs
Test-Parameters: fortestonly testgroup=review-ldiskfs
Test-Parameters: fortestonly testgroup=review-ldiskfs
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;since sanity-scrub only fails about 50% of the time this would give us a 94% chance of catching the regression patch at each stage.  The earliest failures I see outside RHEL7.1 testing is 2015-09-08 (&lt;a href=&quot;http://review.whamcloud.com/16315&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16315&lt;/a&gt; based on commit 01ca8993247383 &quot;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7079&quot; title=&quot;OSP shouldn&amp;#39;t discard requests due to imp_peer_committed_transno&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7079&quot;&gt;&lt;del&gt;LU-7079&lt;/del&gt;&lt;/a&gt; ptlrpc: imp_peer_committed_transno should increase&quot;) and then it starts hitting hard on 2015-09-09, so I suspect it was something landed on 2015-09-08 that caused it.&lt;/p&gt;</comment>
                            <comment id="127374" author="jgmitter" created="Tue, 15 Sep 2015 18:03:49 +0000"  >&lt;p&gt;Landed for 2.8.&lt;/p&gt;</comment>
                            <comment id="127606" author="ys" created="Thu, 17 Sep 2015 05:47:54 +0000"  >&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/bdf2ffa6-5cfb-11e5-945a-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/bdf2ffa6-5cfb-11e5-945a-5254006e85c2&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/9cdbc984-5c91-11e5-b8a8-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/9cdbc984-5c91-11e5-b8a8-5254006e85c2&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/94b10788-5c19-11e5-9dac-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/94b10788-5c19-11e5-9dac-5254006e85c2&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/e7fd2a3e-5bd8-11e5-96c9-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/e7fd2a3e-5bd8-11e5-96c9-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looks like last patch still not resolved this issue. But it is really not so frequently.&lt;/p&gt;</comment>
                            <comment id="127609" author="di.wang" created="Thu, 17 Sep 2015 06:10:41 +0000"  >&lt;p&gt;Looks like still the same problem, there are 7 OSTs, and only 4 OSTs are stopped.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;CMD: shadow-49vm3 grep -c /mnt/mds1&apos; &apos; /proc/mounts
Stopping /mnt/mds1 (opts:-f) on shadow-49vm3
CMD: shadow-49vm3 umount -d -f /mnt/mds1
CMD: shadow-49vm3 lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
CMD: shadow-49vm7 grep -c /mnt/mds2&apos; &apos; /proc/mounts
Stopping /mnt/mds2 (opts:-f) on shadow-49vm7
CMD: shadow-49vm7 umount -d -f /mnt/mds2
CMD: shadow-49vm7 lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
CMD: shadow-49vm7 grep -c /mnt/mds3&apos; &apos; /proc/mounts
Stopping /mnt/mds3 (opts:-f) on shadow-49vm7
CMD: shadow-49vm7 umount -d -f /mnt/mds3
CMD: shadow-49vm7 lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
CMD: shadow-49vm7 grep -c /mnt/mds4&apos; &apos; /proc/mounts
Stopping /mnt/mds4 (opts:-f) on shadow-49vm7
CMD: shadow-49vm7 umount -d -f /mnt/mds4
CMD: shadow-49vm7 lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
CMD: shadow-49vm4 grep -c /mnt/ost1&apos; &apos; /proc/mounts
Stopping /mnt/ost1 (opts:-f) on shadow-49vm4
CMD: shadow-49vm4 umount -d -f /mnt/ost1
CMD: shadow-49vm4 lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
CMD: shadow-49vm4 grep -c /mnt/ost2&apos; &apos; /proc/mounts
Stopping /mnt/ost2 (opts:-f) on shadow-49vm4
CMD: shadow-49vm4 umount -d -f /mnt/ost2
CMD: shadow-49vm4 lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
CMD: shadow-49vm4 grep -c /mnt/ost3&apos; &apos; /proc/mounts
Stopping /mnt/ost3 (opts:-f) on shadow-49vm4
CMD: shadow-49vm4 umount -d -f /mnt/ost3
CMD: shadow-49vm4 lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
CMD: shadow-49vm4 grep -c /mnt/ost4&apos; &apos; /proc/mounts
Stopping /mnt/ost4 (opts:-f) on shadow-49vm4
CMD: shadow-49vm4 umount -d -f /mnt/ost4
CMD: shadow-49vm4 lsmod | grep lnet &amp;gt; /dev/null &amp;amp;&amp;amp; lctl dl | grep &apos; ST &apos;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;I guess we need find out where the OSTCOUNT has been changed? or probably stopall should use &quot;lov.xxxxx.numobd&quot; instead of env var ?&lt;/p&gt;</comment>
                            <comment id="127803" author="gerrit" created="Fri, 18 Sep 2015 17:10:49 +0000"  >&lt;p&gt;Yang Sheng (yang.sheng@intel.com) uploaded a new patch: &lt;a href=&quot;http://review.whamcloud.com/16483&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/16483&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7118&quot; title=&quot;sanity-scrub: No sub tests failed in this test set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7118&quot;&gt;&lt;del&gt;LU-7118&lt;/del&gt;&lt;/a&gt; tests: debug patch&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: ba6612cda149851fff0eea6ee548a99afb95dacf&lt;/p&gt;</comment>
                            <comment id="128051" author="pjones" created="Tue, 22 Sep 2015 06:15:40 +0000"  >&lt;p&gt;Original regularly occurring failure fixed. Dealing with less frequent occasional failure under &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-7193&quot; title=&quot;sanity-scrub: No sub tests failed in this test set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-7193&quot;&gt;&lt;del&gt;LU-7193&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="32011">LU-7123</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="26670">LU-5645</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="31028">LU-6827</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="32256">LU-7193</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzxmwf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>