<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:16:23 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-1410] Test failure on test suite sanity, subtest test_200c</title>
                <link>https://jira.whamcloud.com/browse/LU-1410</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for sarah &amp;lt;sarah@whamcloud.com&amp;gt;&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;https://maloo.whamcloud.com/test_sets/b43d3072-9ecb-11e1-b567-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/b43d3072-9ecb-11e1-b567-52540035b04c&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The sub-test test_200c failed with the following error:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Cannot set pool cea1 to /mnt/lustre/d200.pools/dir_tst&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;I got this error when doing rolling upgrade from 1.8.7 to 2.2.52. MDS is upgraded to 2.2.52-RHEL6 while OST and clients are 1.8.7&lt;br/&gt;
The original configuration is &lt;br/&gt;
MDS, OST:1.8.7-RHEL5; client 1:1.8.7-RHEL5; client 2:1.8.7-RHEL6&lt;/p&gt;</description>
                <environment></environment>
        <key id="14446">LU-1410</key>
            <summary>Test failure on test suite sanity, subtest test_200c</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="keith">Keith Mannthey</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                    </labels>
                <created>Tue, 15 May 2012 17:09:37 +0000</created>
                <updated>Tue, 11 Jun 2013 16:07:46 +0000</updated>
                            <resolved>Tue, 11 Jun 2013 16:07:46 +0000</resolved>
                                    <version>Lustre 2.3.0</version>
                    <version>Lustre 1.8.9</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>5</watches>
                                                                            <comments>
                            <comment id="39021" author="adilger" created="Thu, 17 May 2012 16:48:05 +0000"  >&lt;p&gt;The test log reports:&lt;/p&gt;

&lt;p&gt;== test 200c: Set pool on a directory ================================= == 11:30:23&lt;br/&gt;
pool &apos;lustre.cea1&apos; has no OSTs&lt;br/&gt;
error: setstripe: create stripe file &apos;/mnt/lustre/d200.pools/dir_tst&apos; failed&lt;/p&gt;</comment>
                            <comment id="39023" author="adilger" created="Thu, 17 May 2012 16:54:39 +0000"  >&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== test 200b: Add targets to a pool ==================================== == 11:30:21
fat-amd-1: add the named OSTs to the pool
fat-amd-1: usage pool_add . 
Updated after 0 sec: wanted &apos;&apos; got &apos;&apos;
Resetting fail_loc on all nodes...done.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It looks like test_200c() failed because of test_200b() not adding any OSTs to the pool.  So, test_200b() needs to be examined why this failed (looks like a syntax error, or maybe an empty OST list?).&lt;/p&gt;

&lt;p&gt;Looking at other test results, it seems test_200b() has been working properly until this test run.&lt;/p&gt;</comment>
                            <comment id="39064" author="pjones" created="Fri, 18 May 2012 13:00:51 +0000"  >&lt;p&gt;Keith&lt;/p&gt;

&lt;p&gt;Could you please look into this one?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="39075" author="keith" created="Fri, 18 May 2012 15:09:14 +0000"  >&lt;p&gt;Not sure if it is related yet but from Client 2 (client-5) the previous test from 200x created an error condition.   &lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;Lustre: DEBUG MARKER: == test 171: test libcfs_debug_dumplog_thread stuck in do_exit() ====== == 11:29:38
LustreError: 31901:0:(file.c:281:ll_file_release()) obd_fail_timeout id 50e sleeping for 3000 ms
LustreError: 31901:0:(file.c:281:ll_file_release()) obd_fail_timeout id 50e awake
LustreError: dumping log to /tmp/lustre-log.1337106582.31901
Lustre: DEBUG MARKER: SKIP: sanity test_180 skipping excluded test 180
Lustre: DEBUG MARKER: == test 181: Test open-unlinked dir ======================== == 11:29:48
Lustre: DEBUG MARKER: == test 200a: Create new pool ========================================== == 11:30:07
Lustre: DEBUG MARKER: == test 200b: Add targets to a pool ==================================== == 11:30:21
Lustre: DEBUG MARKER: == test 200c: Set pool on a directory ================================= == 11:30:23
Lustre: DEBUG MARKER: sanity test_200c: @@@@@@ FAIL: Cannot set pool cea1 to /mnt/lustre/d200.pools/dir_tst
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Still looking. &lt;/p&gt;</comment>
                            <comment id="39088" author="keith" created="Fri, 18 May 2012 17:40:13 +0000"  >&lt;p&gt;More likely related to :&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;== test 200b: Add targets to a pool ==================================== == 11:30:21
fat-amd-1: add the named OSTs to the pool
fat-amd-1: usage pool_add &amp;lt;fsname&amp;gt;.&amp;lt;poolname&amp;gt; &amp;lt;ostname indexed list&amp;gt;
Updated after 0 sec: wanted &apos;&apos; got &apos;&apos;
Resetting fail_loc on all nodes...done.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;the pool_add usage note seems like perhaps arguments got mangled into the call for 200b. &lt;/p&gt;</comment>
                            <comment id="39093" author="keith" created="Fri, 18 May 2012 20:33:00 +0000"  >&lt;p&gt;It seems the pool_add didn&apos;t end up with valid arguments.   &lt;/p&gt;

&lt;p&gt;It looks like jt_pool_cmd (in utils/obd.c) returned CMD_HELP and aborted the pool_add. It likely had the wrong number of arguments passed into the function. &lt;/p&gt;

&lt;p&gt;The code around the test sanity.sh has been stable.&lt;/p&gt;

&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;test_200b() {
        remote_mgs_nodsh &amp;amp;&amp;amp; skip &quot;remote MGS with nodsh&quot; &amp;amp;&amp;amp; return
        TGT=$(for i in $TGTPOOL_LIST; do printf &quot;$FSNAME-OST%04x_UUID &quot; $i; done)
        do_facet mgs $LCTL pool_add $FSNAME.$POOL \
                $FSNAME-OST[$TGTPOOL_FIRST-$TGTPOOL_MAX/$TGTPOOL_STEP]
        wait_update $HOSTNAME &quot;lctl get_param -n lov.$FSNAME-*.pools.$POOL | sort -u | tr &apos;\n&apos; &apos; &apos; &quot; &quot;$TGT&quot; ||
                error &quot;Add to pool failed&quot;
        local lfscount=$($LFS pool_list $FSNAME.$POOL | grep -c &quot;\-OST&quot;)
        local addcount=$((($TGTPOOL_MAX - $TGTPOOL_FIRST) / $TGTPOOL_STEP + 1))
        [ $lfscount -eq $addcount ] ||
                error &quot;lfs pool_list bad ost count $lfscount != $addcount&quot;
}
run_test 200b &quot;Add targets to a pool ====================================&quot;
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
</comment>
                            <comment id="42345" author="keith" created="Thu, 26 Jul 2012 15:00:32 +0000"  >&lt;p&gt;I have ran sanity and the just 200 tests in loops on a local test environment (not with rolling upgrades).   I have not seen this issue and there are no similar reports maloo issues. &lt;/p&gt;

&lt;p&gt;The rolling upgrade is likely the key to reproduction. I know this is from a while ago but do we know what part of the failover (MDS or OST) we were at when the test failed? &lt;/p&gt;

&lt;p&gt;My local vm&apos;s are a little busy right now but I think running sanity test 200 (with debugging) while doing a rollover in the next step.  &lt;/p&gt;</comment>
                            <comment id="42848" author="keith" created="Tue, 7 Aug 2012 18:41:25 +0000"  >&lt;p&gt;What is the proper rollover target at this time?  1.8.7 to Master?   I think there is now a srub issue that introduces forward interoperability.  It is not the case I will give it a go. &lt;/p&gt;</comment>
                            <comment id="42976" author="keith" created="Thu, 9 Aug 2012 18:31:27 +0000"  >&lt;p&gt;I got scripts from sarah to run tests and I am working on getting 1.8.8 to Master test run completed.  &lt;/p&gt;</comment>
                            <comment id="43235" author="keith" created="Wed, 15 Aug 2012 00:27:14 +0000"  >&lt;p&gt;Ok I am still working to get the first re-test done.  With some direction and help from Sarah it appears I need physical nodes (I have several virtual nodes right now).  Tomorrow I will search out correct nodes to run this test. &lt;/p&gt;</comment>
                            <comment id="43274" author="keith" created="Wed, 15 Aug 2012 13:28:53 +0000"  >&lt;p&gt;It appears we can add persistent storage to the VM nodes.  I am working to enable this so virtual (easy to get) nodes are able to complete this test. &lt;/p&gt;</comment>
                            <comment id="43394" author="keith" created="Fri, 17 Aug 2012 01:38:36 +0000"  >&lt;p&gt;With Chris&apos;s help I have a virtual node with persistent storage.  YEA!   I have just kicked off the first test run, I will update when I know more. &lt;/p&gt;</comment>
                            <comment id="43436" author="keith" created="Fri, 17 Aug 2012 15:35:17 +0000"  >&lt;p&gt;Ok so the root issues this line:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; pdsh -l root -t 100 -S -w client-12vm3 &apos;(PATH=$PATH:/usr/lib64/lustre/utils:/usr/lib64/lustre/tests:/sbin:/usr/sbin; cd /usr/lib64/lustre/tests; sh -c &quot;/usr/sbin/lctl&apos; pool_add lustre.cea1 &apos;lustre-OST[1-0/2]&quot;)&apos;
 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;lustre-OST&lt;span class=&quot;error&quot;&gt;&amp;#91;1-0/2&amp;#93;&lt;/span&gt; is the ostname index list arg and it should be lustre-OST&lt;span class=&quot;error&quot;&gt;&amp;#91;0-1/2&amp;#93;&lt;/span&gt; for the one OST case.  Sarah and I are working to fine the correct part of the testing macro stack to fix. &lt;/p&gt;


&lt;p&gt;I have manually test the lustre-OST&lt;span class=&quot;error&quot;&gt;&amp;#91;0-1/2&amp;#93;&lt;/span&gt; change and it works. &lt;/p&gt;

&lt;p&gt;I will update when the issue is fixed. &lt;/p&gt;</comment>
                            <comment id="43444" author="keith" created="Fri, 17 Aug 2012 17:22:10 +0000"  >&lt;p&gt;Well this issue is in the sanity test. &lt;/p&gt;

&lt;p&gt;There are 2 versions of this test.  The 1.8.8 version and the master version. The main client is 1.8.8 so it uses the older test code.  When it upgrades (clients upgrade last) it will use the master test code.   This is very likely a 1.8.8 branch issue with Sanity 200 an a 1 OST only configuration. A full rolling upgrade test (it will take a while) will tell us if a 2.3 change is needed.   Very likely running with 2 or more OSTs and 1.8 is fine with this test. &lt;/p&gt;

&lt;p&gt;At this point there is no indication there is a problem with master but more testing is needed. &lt;/p&gt;

</comment>
                            <comment id="43530" author="keith" created="Mon, 20 Aug 2012 22:05:30 +0000"  >&lt;p&gt;Master also has this test issue.  I have submitted patches for both 1_8 and Master for further review and test. &lt;/p&gt;


&lt;p&gt;b1_8&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/3731&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3731&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;master:&lt;br/&gt;
&lt;a href=&quot;http://review.whamcloud.com/3730&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/3730&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="43589" author="keith" created="Tue, 21 Aug 2012 20:50:52 +0000"  >&lt;p&gt;I have given up my nodes as the initial issue has patches pending. An official retest should be the next step. &lt;/p&gt;

&lt;p&gt;A patched 1.8 and Master will allow the automated the run the Sanity 200 tests to run on a one OST setup.   &lt;/p&gt;</comment>
                            <comment id="43623" author="pjones" created="Wed, 22 Aug 2012 10:47:28 +0000"  >&lt;p&gt;Sarah&lt;/p&gt;

&lt;p&gt;Given that upgrade/downgrade testing is done manually, does knowing what triggers this issue allow you to setup in a way to workaround it and complete the rest of the testing?&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="43640" author="sarah" created="Wed, 22 Aug 2012 12:51:52 +0000"  >&lt;p&gt;Hi Peter, yes, I know the workaround way to run this test&lt;/p&gt;</comment>
                            <comment id="43646" author="pjones" created="Wed, 22 Aug 2012 13:37:53 +0000"  >&lt;p&gt;ok then dropping priority. The patches can still land to improve the flexibility of the test in the long term but this is really only a problem that will crop up in testing situations and not production situations.&lt;/p&gt;</comment>
                            <comment id="60361" author="keith" created="Tue, 11 Jun 2013 16:07:46 +0000"  >&lt;p&gt;B1_8 patch now landed. &lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzv5uf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4481</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>