<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:38:06 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-10778] racer test 1 hangs in racer process cleanup</title>
                <link>https://jira.whamcloud.com/browse/LU-10778</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;racer test_1 hangs with very little information on what is hung. In the suite_log, the only information that looks different from a successful run is&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;layout: raid0 raid0
layout: raid0 raid0
racer cleanup
  Trace dump:
  = ./file_create.sh:1:main()
: FAIL: test-framework exiting on error
racer cleanup
sleeping 5 sec ...
sleeping 5 sec ...
there should be NO racer processes:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Filesystem            1K-blocks   Used Available Use% Mounted on
10.2.4.96@tcp:/lustre  75566092 298872  71166892   1% /mnt/lustre2
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
there should be NO racer processes:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Filesystem            1K-blocks   Used Available Use% Mounted on
10.2.4.96@tcp:/lustre  75566092 298872  71166892   1% /mnt/lustre2
We survived /usr/lib64/lustre/tests/racer/racer.sh for 300 seconds.
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Looking at the above output and looking at racer_cleanup() in tests/racer/racer.sh&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;  27 racer_cleanup()
  28 {
  29         echo &quot;racer cleanup&quot;
  30         for P in $RACER_PROGS; do
  31                 killall -q $P.sh
  32         done
  33         trap 0
  34 
  35         local TOT_WAIT=0
  36         local MAX_WAIT=$DURATION
  37         local SHORT_WAIT=5
  38 
  39         local rc
  40         while [[ $TOT_WAIT -le $MAX_WAIT ]]; do
  41                 rc=0
  42                 echo sleeping $SHORT_WAIT sec ...
  43                 sleep $SHORT_WAIT
  44                 # this only checks whether processes exist
  45                 for P in $RACER_PROGS; do
  46                         killall -0 $P.sh
  47                         [[ $? -eq 0 ]] &amp;amp;&amp;amp; (( rc+=1 ))
  48                 done
  49                 if [[ $rc -eq 0 ]]; then
  50                         echo there should be NO racer processes:
  51                         ps uww -C &quot;${RACER_PROGS// /,}&quot;
  52                         return 0
  53                 fi
  54                 echo -n &quot;Waited $(( TOT_WAIT + SHORT_WAIT)), rc=$rc &quot;
  55                 (( SHORT_WAIT+=SHORT_WAIT ))
  56                 (( TOT_WAIT+=SHORT_WAIT ))
  57         done
  58         ps uww -C &quot;${RACER_PROGS// /,}&quot;
  59         return 1
  60 }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;We see the output &#8220;there should be NO racer processes:&#8221;, but we don&#8217;t see &#8220;Waited &#8230;&#8221;. Could the test be hung in cleaning up processes?&lt;/p&gt;

&lt;p&gt;There are stack traces in the console messages for each of the nodes/vms with Lustre processes listed in some of the traces. For example on the &lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt; [ 9869.370497] SysRq : Changing Loglevel
[ 9869.370996] Loglevel set to 8
[ 9869.860527] SysRq : Show State
[ 9869.860942]   task                        PC stack   pid father
[ 9869.861552] systemd         S ffff88007c140000     0     1      0 0x00000000
[ 9869.862317] Call Trace:
[ 9869.862579]  [&amp;lt;ffffffff816ab8a9&amp;gt;] schedule+0x29/0x70
[ 9869.863062]  [&amp;lt;ffffffff816aa9bd&amp;gt;] schedule_hrtimeout_range_clock+0x12d/0x150
[ 9869.863824]  [&amp;lt;ffffffff8124da09&amp;gt;] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0
[ 9869.864588]  [&amp;lt;ffffffff816aa9f3&amp;gt;] schedule_hrtimeout_range+0x13/0x20
[ 9869.865246]  [&amp;lt;ffffffff8124dc9e&amp;gt;] ep_poll+0x23e/0x360
[ 9869.865790]  [&amp;lt;ffffffff810c6620&amp;gt;] ? wake_up_state+0x20/0x20
[ 9869.866337]  [&amp;lt;ffffffff8124f12d&amp;gt;] SyS_epoll_wait+0xed/0x120
[ 9869.866933]  [&amp;lt;ffffffff816b8930&amp;gt;] ? system_call_after_swapgs+0x15d/0x214
[ 9869.867586]  [&amp;lt;ffffffff816b89fd&amp;gt;] system_call_fastpath+0x16/0x1b
[ 9869.868228]  [&amp;lt;ffffffff816b889d&amp;gt;] ? system_call_after_swapgs+0xca/0x214
&#8230;
[ 9871.032843] Call Trace:
[ 9871.033091]  [&amp;lt;ffffffff816ab8a9&amp;gt;] schedule+0x29/0x70
[ 9871.033588]  [&amp;lt;ffffffffc0691884&amp;gt;] cfs_wi_scheduler+0x304/0x450 [libcfs]
[ 9871.034328]  [&amp;lt;ffffffff810b3690&amp;gt;] ? wake_up_atomic_t+0x30/0x30
[ 9871.034918]  [&amp;lt;ffffffffc0691580&amp;gt;] ? cfs_wi_sched_create+0x680/0x680 [libcfs]
[ 9871.035593]  [&amp;lt;ffffffff810b270f&amp;gt;] kthread+0xcf/0xe0
[ 9871.036123]  [&amp;lt;ffffffff810b2640&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 9871.036728]  [&amp;lt;ffffffff816b8798&amp;gt;] ret_from_fork+0x58/0x90
[ 9871.037314]  [&amp;lt;ffffffff810b2640&amp;gt;] ? insert_kthread_work+0x40/0x40
[ 9871.037927] cfs_rh_01       S ffff880079f50000     0 13072      2 0x00000080
[ 

&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;but no process is clearly hung.&lt;/p&gt;

&lt;p&gt;Logs for these hangs are at&lt;br/&gt;
&lt;a href=&quot;https://testing.whamcloud.com/test_sets/7ffca4b0-200b-11e8-9ec4-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/7ffca4b0-200b-11e8-9ec4-52540065bddc&lt;/a&gt;&lt;/p&gt;
</description>
                <environment></environment>
        <key id="51140">LU-10778</key>
            <summary>racer test 1 hangs in racer process cleanup</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="jamesanunez">James Nunez</reporter>
                        <labels>
                    </labels>
                <created>Tue, 6 Mar 2018 15:21:00 +0000</created>
                <updated>Fri, 11 Aug 2023 12:02:46 +0000</updated>
                                            <version>Lustre 2.11.0</version>
                    <version>Lustre 2.10.4</version>
                    <version>Lustre 2.13.0</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>6</watches>
                                                                            <comments>
                            <comment id="227529" author="standan" created="Tue, 8 May 2018 19:19:09 +0000"  >&lt;p&gt;+1 on 2.10.3&#160;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/16cb3064-50c2-11e8-b9d3-52540065bddc&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/16cb3064-50c2-11e8-b9d3-52540065bddc&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="241511" author="adilger" created="Wed, 6 Feb 2019 22:58:00 +0000"  >&lt;p&gt;This is still happening fairly frequently, about 20% of recent test runs.&lt;/p&gt;</comment>
                            <comment id="306693" author="arshad512" created="Fri, 9 Jul 2021 09:49:54 +0000"  >&lt;p&gt;+1 On Master: &lt;a href=&quot;https://testing.whamcloud.com/test_sets/98916306-88da-4f75-a874-58d0b7c9f8fb&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/98916306-88da-4f75-a874-58d0b7c9f8fb&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="306696" author="arshad512" created="Fri, 9 Jul 2021 10:12:30 +0000"  >&lt;p&gt;&amp;gt;+1 On Master: &lt;a href=&quot;https://testing.whamcloud.com/test_sets/98916306-88da-4f75-a874-58d0b7c9f8fb&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/98916306-88da-4f75-a874-58d0b7c9f8fb&lt;/a&gt;&lt;br/&gt;
Sorry. This is unrelated. &lt;/p&gt;</comment>
                            <comment id="319570" author="JIRAUSER17102" created="Tue, 30 Nov 2021 14:38:20 +0000"  >&lt;p&gt;Might have encountered this in 2.12.8 testing: &lt;a href=&quot;https://testing.whamcloud.com/test_sets/d46c6037-83f1-4fed-8f56-d03de5a7ff1e&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.whamcloud.com/test_sets/d46c6037-83f1-4fed-8f56-d03de5a7ff1e&lt;/a&gt;&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
...
racer cleanup
racer cleanup
sleeping 5 sec ...
sleeping 5 sec ...
./file_exec.sh: line 16:  6184 Terminated              $DIR/$file 0.$((RANDOM % 5 + 1)) 2&amp;gt; /dev/&lt;span class=&quot;code-keyword&quot;&gt;null&lt;/span&gt;
there should be NO racer processes:
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Filesystem                1K-blocks    Used Available Use% Mounted on
10.240.23.189@tcp:/lustre  13362580 3687184   8803420  30% /mnt/lustre2
We survived /usr/lib64/lustre/tests/racer/racer.sh &lt;span class=&quot;code-keyword&quot;&gt;for&lt;/span&gt; 900 seconds. 
...&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
[Sat Nov 20 06:05:59 2021] systemd         S ffff9485783e8640     0     1      0 0x00000000
[Sat Nov 20 06:05:59 2021] Call Trace:
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff93789179&amp;gt;] schedule+0x29/0x70
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff937885bd&amp;gt;] schedule_hrtimeout_range_clock+0x12d/0x150
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff9329d5e9&amp;gt;] ? ep_scan_ready_list.isra.7+0x1b9/0x1f0
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff937885f3&amp;gt;] schedule_hrtimeout_range+0x13/0x20
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff9329d87e&amp;gt;] ep_poll+0x23e/0x360
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff93260a81&amp;gt;] ? do_unlinkat+0xf1/0x2d0
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff930dadf0&amp;gt;] ? wake_up_state+0x20/0x20
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff9329ed5d&amp;gt;] SyS_epoll_wait+0xed/0x120
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff93795ec9&amp;gt;] ? system_call_after_swapgs+0x96/0x13a
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff93795f92&amp;gt;] system_call_fastpath+0x25/0x2a
[Sat Nov 20 06:05:59 2021]  [&amp;lt;ffffffff93795ed5&amp;gt;] ? system_call_after_swapgs+0xa2/0x13a
...
cfs_rh_00       S ffff9484f6656300     0  9208      2 0x00000080
[Sat Nov 20 06:06:01 2021] Call Trace:
[Sat Nov 20 06:06:01 2021]  [&amp;lt;ffffffff93789179&amp;gt;] schedule+0x29/0x70
[Sat Nov 20 06:06:01 2021]  [&amp;lt;ffffffffc09a8e6c&amp;gt;] cfs_wi_scheduler+0x30c/0x460 [libcfs]
[Sat Nov 20 06:06:01 2021]  [&amp;lt;ffffffff930c6f50&amp;gt;] ? wake_up_atomic_t+0x30/0x30
[Sat Nov 20 06:06:01 2021]  [&amp;lt;ffffffffc09a8b60&amp;gt;] ? cfs_wi_sched_create+0x680/0x680 [libcfs]
[Sat Nov 20 06:06:01 2021]  [&amp;lt;ffffffff930c5e61&amp;gt;] kthread+0xd1/0xe0
[Sat Nov 20 06:06:01 2021]  [&amp;lt;ffffffff930c5d90&amp;gt;] ? insert_kthread_work+0x40/0x40
[Sat Nov 20 06:06:01 2021]  [&amp;lt;ffffffff93795df7&amp;gt;] ret_from_fork_nospec_begin+0x21/0x21
[Sat Nov 20 06:06:01 2021]  [&amp;lt;ffffffff930c5d90&amp;gt;] ? insert_kthread_work+0x40/0x40
[Sat Nov 20 06:06:01 2021] cfs_rh_01       S ffff9484f6652100     0  9209      2 0x00000080 &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzztxj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>