<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:25:34 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9367] parallel-scale test_cascading_rw: cascading_rw failed! 1 </title>
                <link>https://jira.whamcloud.com/browse/LU-9367</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;This issue was created by maloo for sarah_lw &amp;lt;wei3.liu@intel.com&amp;gt;&lt;/p&gt;

&lt;p&gt;This issue relates to the following test suite run: &lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/45d48942-2507-11e7-9de9-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/45d48942-2507-11e7-9de9-5254006e85c2&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The sub-test test_cascading_rw failed with the following error:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;cascading_rw failed! 1
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;server/client: lustre-master #3558 ldiskfs el7&lt;/p&gt;

&lt;p&gt;test log&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;+ su mpiuser sh -c &quot;/usr/lib64/compat-openmpi16/bin/mpirun --mca btl tcp,self --mca btl_tcp_if_include eth0 -mca boot ssh -machinefile /tmp/parallel-scale.machines -np 4 /usr/lib64/lustre/tests/cascading_rw -g -d /mnt/lustre/d0.cascading_rw -n 300 &quot;
--------------------------------------------------------------------------
A deprecated MCA parameter value was specified in an MCA parameter
file.  Deprecated MCA parameters should be avoided; they may disappear
in future releases.

  Deprecated parameter: plm_rsh_agent
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A deprecated MCA parameter value was specified in an MCA parameter
file.  Deprecated MCA parameters should be avoided; they may disappear
in future releases.

  Deprecated parameter: plm_rsh_agent
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A deprecated MCA parameter value was specified in an MCA parameter
file.  Deprecated MCA parameters should be avoided; they may disappear
in future releases.

  Deprecated parameter: plm_rsh_agent
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A deprecated MCA parameter value was specified in an MCA parameter
file.  Deprecated MCA parameters should be avoided; they may disappear
in future releases.

  Deprecated parameter: plm_rsh_agent
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A deprecated MCA parameter value was specified in an MCA parameter
file.  Deprecated MCA parameters should be avoided; they may disappear
in future releases.

  Deprecated parameter: plm_rsh_agent
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A deprecated MCA parameter value was specified in an MCA parameter
file.  Deprecated MCA parameters should be avoided; they may disappear
in future releases.

  Deprecated parameter: plm_rsh_agent
--------------------------------------------------------------------------
/usr/lib64/lustre/tests/cascading_rw is running with 4 process(es) in DEBUG mode
23:47:45: Running test #/usr/lib64/lustre/tests/cascading_rw(iter 0)
[trevis-55vm1:21694] *** Process received signal ***
[trevis-55vm1:21694] Signal: Floating point exception (8)
[trevis-55vm1:21694] Signal code: Integer divide-by-zero (1)
[trevis-55vm1:21694] Failing at address: 0x4024c8
[trevis-55vm1:21694] [ 0] /lib64/libpthread.so.0(+0xf370) [0x7fdf9fad6370]
[trevis-55vm1:21694] [ 1] /usr/lib64/lustre/tests/cascading_rw() [0x4024c8]
[trevis-55vm1:21694] [ 2] /usr/lib64/lustre/tests/cascading_rw() [0x402be0]
[trevis-55vm1:21694] [ 3] /usr/lib64/lustre/tests/cascading_rw() [0x40158e]
[trevis-55vm1:21694] [ 4] /lib64/libc.so.6(__libc_start_main+0xf5) [0x7fdf9f727b35]
[trevis-55vm1:21694] [ 5] /usr/lib64/lustre/tests/cascading_rw() [0x40169d]
[trevis-55vm1:21694] *** End of error message ***
[trevis-55vm1.trevis.hpdd.intel.com][[36239,1],2][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
[trevis-55vm2.trevis.hpdd.intel.com][[36239,1],1][btl_tcp_frag.c:215:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 21694 on node trevis-55vm1.trevis.hpdd.intel.com exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------
 parallel-scale test_cascading_rw: @@@@@@ FAIL: cascading_rw failed! 1 
  Trace dump:
  = /usr/lib64/lustre/tests/test-framework.sh:4905:error()
  = /usr/lib64/lustre/tests/functions.sh:734:run_cascading_rw()
  = /usr/lib64/lustre/tests/parallel-scale.sh:130:test_cascading_rw()
  = /usr/lib64/lustre/tests/test-framework.sh:5181:run_one()
  = /usr/lib64/lustre/tests/test-framework.sh:5220:run_one_logged()
  = /usr/lib64/lustre/tests/test-framework.sh:5067:run_test()
  = /usr/lib64/lustre/tests/parallel-scale.sh:132:main()
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</description>
                <environment></environment>
        <key id="45590">LU-9367</key>
            <summary>parallel-scale test_cascading_rw: cascading_rw failed! 1 </summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="maloo">Maloo</reporter>
                        <labels>
                    </labels>
                <created>Wed, 19 Apr 2017 18:12:28 +0000</created>
                <updated>Mon, 12 Jun 2017 22:39:51 +0000</updated>
                            <resolved>Mon, 12 Jun 2017 22:39:51 +0000</resolved>
                                    <version>Lustre 2.10.0</version>
                                    <fixVersion>Lustre 2.10.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="193824" author="jamesanunez" created="Thu, 27 Apr 2017 19:41:27 +0000"  >&lt;p&gt;parallel-scale test_cascading_rw started failing on April 11, 2017 and has failed 59 times since that time. The failures are all for the &apos;full&apos; test group, but no clear configuration is always failing; there are some interop failures,  zfs and ldiskfs failures, and SLES11 SP4, CentOS 6.8 or 7 failures.&lt;/p&gt;

&lt;p&gt;The logs for the first three failures are at:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/b1cc400c-1e83-11e7-8920-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/b1cc400c-1e83-11e7-8920-5254006e85c2&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/d2e19c60-1eba-11e7-b742-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/d2e19c60-1eba-11e7-b742-5254006e85c2&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/70e2c49a-1e9b-11e7-9de9-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/70e2c49a-1e9b-11e7-9de9-5254006e85c2&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="193849" author="adilger" created="Fri, 28 Apr 2017 00:07:26 +0000"  >&lt;p&gt;This looks like right after the PFL feature landed (April 8), so it probably makes sense for Bobijam to take a look at it.&lt;/p&gt;</comment>
                            <comment id="193990" author="bobijam" created="Sat, 29 Apr 2017 15:29:28 +0000"  >&lt;p&gt;commit fafe6b4d4a6fa63cedff3bd44e6578009578b3d7 changes ll_lov_setstripe()&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;static int ll_lov_setstripe(struct inode *inode, struct file *file, unsigned long arg)&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;@@ -1694,14 +1703,6 @@ &lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-object&quot;&gt;int&lt;/span&gt; ll_lov_setstripe(struct inode *inode, struct file *file,
 
        lum_size = rc;
        rc = ll_lov_setstripe_ea_info(inode, file, flags, klum, lum_size);
-       &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc == 0) {
-               __u32 gen;
-
-               put_user(0, &amp;amp;lum-&amp;gt;lmm_stripe_count);
-
-               ll_layout_refresh(inode, &amp;amp;gen);
-               rc = ll_file_getstripe(inode, (struct lov_user_md __user *)arg);
-       }
 
        OBD_FREE(klum, lum_size);
        RETURN(rc);

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The thinking behind it is that ll_lov_setstripe() only uses the lum to set the file&apos;s stripe, while it shouldn&apos;t have the side effect to retrieve the instantiated stripe info back to the lum.&lt;/p&gt;

&lt;p&gt;And lustre/tests/mpi/cascading_rw.c just exploits the side effect of stripe retrieving during the stripe setting call.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeHeader panelHeader&quot; style=&quot;border-bottom-width: 1px;&quot;&gt;&lt;b&gt;void rw_file(char *name, long stride, unsigned int seed)&lt;/b&gt;&lt;/div&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rank == 0) {
                remove_file_or_dir(filename);

                lum.lmm_magic = LOV_USER_MAGIC;
                lum.lmm_stripe_size = 0;
                lum.lmm_stripe_count = 0;
                lum.lmm_stripe_offset = -1;

                fd = open(filename, O_CREAT | O_RDWR | O_LOV_DELAY_CREATE,
                          FILEMODE);
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (fd == -1) {
                        sprintf(errmsg, &lt;span class=&quot;code-quote&quot;&gt;&quot;open of file %s&quot;&lt;/span&gt;, filename);
                        FAIL(errmsg);
                }

                rc = ioctl(fd, LL_IOC_LOV_SETSTRIPE, &amp;amp;lum);
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rc == -1) {
                        sprintf(errmsg, &lt;span class=&quot;code-quote&quot;&gt;&quot;ioctl SETSTRIPE of file %s&quot;&lt;/span&gt;, filename);
                        FAIL(errmsg);
                }

                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (close(fd) == -1) {
                        sprintf(errmsg, &lt;span class=&quot;code-quote&quot;&gt;&quot;close of file %s&quot;&lt;/span&gt;, filename);
                        FAIL(errmsg);
                }
        }

        MPI_Barrier(MPI_COMM_WORLD);

        &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (stride &amp;lt; 0) {
                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (rank == 0) {
                        srandom(seed);
                        &lt;span class=&quot;code-keyword&quot;&gt;while&lt;/span&gt; (stride &amp;lt; page_size/2) {
                                stride = random();
                                stride -= stride % 16;
                                &lt;span class=&quot;code-keyword&quot;&gt;if&lt;/span&gt; (stride &amp;lt; 0)
                                        stride = -stride;
                                stride %= 2 * lum.lmm_stripe_size;    &lt;span class=&quot;code-comment&quot;&gt;// *DEVIDE BY ZERO EXCEPTION HERE*
&lt;/span&gt;                        }
                }

                MPI_Barrier(MPI_COMM_WORLD);
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We can fix the cascading_rw exploit by calling ioctl(fd, LL_IOC_LOV_GETSTRIPE, &amp;amp;lum) after SETSTRIPE call so the lum.lmm_stripe_size is the real stripe_size instead of 0.&lt;/p&gt;</comment>
                            <comment id="194133" author="gerrit" created="Tue, 2 May 2017 12:32:33 +0000"  >&lt;p&gt;Bobi Jam (bobijam@hotmail.com) uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/26915&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/26915&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9367&quot; title=&quot;parallel-scale test_cascading_rw: cascading_rw failed! 1 &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9367&quot;&gt;&lt;del&gt;LU-9367&lt;/del&gt;&lt;/a&gt; mpi: get rid of SETSTRIPE side effect&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7a493a89ea6bc5ea9f7429d9507891741905d4d8&lt;/p&gt;</comment>
                            <comment id="194145" author="adilger" created="Tue, 2 May 2017 13:58:01 +0000"  >&lt;p&gt;I&apos;m not sure why you consider this to be a side effect?  There has been code to explicitly call getstripe and return this to userspace forever. Why not just restore the call to ll_file_getstripe()?&lt;/p&gt;</comment>
                            <comment id="194148" author="bobijam" created="Tue, 2 May 2017 14:16:48 +0000"  >&lt;p&gt;I am fine to restore the call to ll_file_getstripe(). And I always think the get/set methods to be with one direction data flow, and with that in mind, the ll_lov_setstripe() passes lum back to the caller is a side effect of that function to me.&lt;/p&gt;</comment>
                            <comment id="194151" author="bobijam" created="Tue, 2 May 2017 14:42:31 +0000"  >&lt;p&gt;And&lt;/p&gt;

&lt;p&gt;#define LL_IOC_LOV_SETSTRIPE            _IOW (&apos;f&apos;, 154, long)&lt;br/&gt;
#define LL_IOC_LOV_GETSTRIPE            _IOW (&apos;f&apos;, 155, long)&lt;/p&gt;

&lt;p&gt;shows that LL_IOC_LOV_SETSTRIPE should be WRITE ONLY ioctl(). And it seems that there is an error for LL_IOC_LOV_GETSTRIPE interface definition.&lt;/p&gt;</comment>
                            <comment id="194364" author="casperjx" created="Wed, 3 May 2017 19:41:17 +0000"  >&lt;p&gt;This might be related to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9429&quot; title=&quot;parallel-scale test_parallel_grouplock: test failed to respond and timed out&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9429&quot;&gt;LU-9429&lt;/a&gt;.  The parallel_grouplock subtest started failing tag testing after 2017-04-05.  In each case, parallel_grouplock times out after cascading_rw fails.&lt;/p&gt;

&lt;p&gt;tag 55 test (b3550):    2017-04-05 (parallel_grouplock 100% passing)&lt;br/&gt;
several PFL landings: 2017-04-08&lt;br/&gt;
tag 56 test (b3565):    2017-04-23 (parallel_grouplock 100% failing)&lt;/p&gt;</comment>
                            <comment id="198963" author="gerrit" created="Mon, 12 Jun 2017 22:35:15 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;https://review.whamcloud.com/26915/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/26915/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-9367&quot; title=&quot;parallel-scale test_cascading_rw: cascading_rw failed! 1 &quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-9367&quot;&gt;&lt;del&gt;LU-9367&lt;/del&gt;&lt;/a&gt; llite: restore ll_file_getstripe in ll_lov_setstripe&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: 364ec95f3688ac5cc3195f7f46d0d860844796f9&lt;/p&gt;</comment>
                            <comment id="198964" author="pjones" created="Mon, 12 Jun 2017 22:39:51 +0000"  >&lt;p&gt;Landed for 2.10&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="45517">LU-9349</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="45812">LU-9429</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzan3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>