<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 03:29:03 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-16674] read contention on &quot;job_stats&quot; &quot;/proc&quot; file</title>
                <link>https://jira.whamcloud.com/browse/LU-16674</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;At the CEA, we observed a lot of contention on job_stats with a lot of jobs.&lt;/p&gt;

&lt;p&gt;In some critical cases (incorrect job_name pattern or a lot of read accesses on job_stats), this could lead the target to freeze.&lt;/p&gt;

&lt;p&gt;When reading the proc file &quot;job_stats&quot; a read lock &quot;ojs_lock&quot; is taken to read the list of job. While reading this file, no job stat entry can be added or removed, so the target processes must wait for the write lock.&lt;/p&gt;

&lt;p&gt;I think we can avoid those kinds of contention:&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Save the last job read&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;For each read, the processes go through the entire list of jobs to get the entry corresponding to the file offset.&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-c&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;code-keyword&quot;&gt;&lt;span class=&quot;code-object&quot;&gt;void&lt;/span&gt;&lt;/span&gt; *lprocfs_jobstats_seq_start(&lt;span class=&quot;code-keyword&quot;&gt;struct&lt;/span&gt; seq_file *p, loff_t *pos)&#160;
.....
off--; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;
list_for_each_entry(job, &amp;amp;stats-&amp;gt;ojs_list, js_list) { &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#160;
&#160; &#160; &#160; &#160; if (!off--) &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;&#160;
&#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &lt;span class=&quot;code-keyword&quot;&gt;return&lt;/span&gt; job; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160; 
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This could be improved by saving the last job accessed and its corresponding offset.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;Use RCU lock instead of rwlock&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;RCU locking to protect the job stat list should be contention free for the read accesses.&lt;/p&gt;</description>
                <environment></environment>
        <key id="75236">LU-16674</key>
            <summary>read contention on &quot;job_stats&quot; &quot;/proc&quot; file</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="eaujames">Etienne Aujames</assignee>
                                    <reporter username="eaujames">Etienne Aujames</reporter>
                        <labels>
                    </labels>
                <created>Mon, 27 Mar 2023 19:00:28 +0000</created>
                <updated>Fri, 7 Jul 2023 13:28:07 +0000</updated>
                            <resolved>Tue, 18 Apr 2023 12:15:11 +0000</resolved>
                                                    <fixVersion>Lustre 2.16.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="367478" author="adilger" created="Tue, 28 Mar 2023 01:39:35 +0000"  >&lt;p&gt;Firstly, it isn&apos;t mentioned whether this contention is on a client or server?  I would guess on the server, but then what processes are reading from job_stats so heavily on the server?  Is this some monitoring tool? &lt;/p&gt;

&lt;p&gt;Note that is the job_stats data is being ingested frequently by a userspace monitor, it would be possible to shorten the cleanup intervsl for old jobids to be dropped from the 10 minutes that is the current default. Alternately, old jobs can manually be evicted from the stats by writing the jobid into the job_stats file to keep the list shorter. &lt;/p&gt;

&lt;p&gt;Also, it is worthwhile to understand if the jobid is being created properly for aggregating stats?  This isn&apos;t intended to give per-thread stats on the server for every individual command that is run on the client, so the linear list it isn&apos;t going to work well if you have 100k or 1M unique JobIds.  However, since the full list is returned for each read, alternate data structures may not help unless this is changed to allow partial locking of the hash table or similar. &lt;/p&gt;

&lt;p&gt;As for bad JobIDs, there have been a couple of bugs already fixed in this area, mainly caused by bad input from the client, not from any issue on the server. &lt;/p&gt;</comment>
                            <comment id="367610" author="gerrit" created="Tue, 28 Mar 2023 19:48:55 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50459&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50459&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16674&quot; title=&quot;read contention on &amp;quot;job_stats&amp;quot; &amp;quot;/proc&amp;quot; file&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16674&quot;&gt;&lt;del&gt;LU-16674&lt;/del&gt;&lt;/a&gt; obdclass: optimize job_stats reads&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: e5a08b991c3f48267a1fdba81971e9125a92b494&lt;/p&gt;</comment>
                            <comment id="367699" author="eaujames" created="Wed, 29 Mar 2023 09:53:46 +0000"  >&lt;p&gt;Yes, the contention was observed on servers.&lt;/p&gt;

&lt;p&gt;The hang is observed when job_stat are added and suppressed at high rate (write lock needed on jobstat list). For most of those cases, the issue was malformed jobid from the compute side or a lot of short jobs (e.g: mpi tests).&lt;/p&gt;

&lt;p&gt;For example, we had some issues with users escaping their environment by opening new sessions via ssh in their job.&lt;/p&gt;

&lt;p&gt;We had also 2 different monitoring scripts using jobstat, the ATOS team is rewriting those scripts in one.&lt;/p&gt;

&lt;p&gt;To mitigate the issue, we have already shortened the job_cleanup_interval.&lt;/p&gt;

&lt;p&gt;The idea here is to protect the servers from malformed jobid (from bad practices), speed up job_stats reads and limit the performance impact of the monitoring jobs on the kernel threads.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="367712" author="eaujames" created="Wed, 29 Mar 2023 12:22:21 +0000"  >&lt;p&gt;I have done perf analysis during all sanity test 205g duration (with slow mode 240s), but with job_cleanup_interval to 300s.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;perf record -ag -F100
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;b&gt;without the patch&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;For the job stats read processes:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;40.22%--seq_read
        |          
        |--25.10%--lprocfs_jobstats_seq_start
        |          
         --14.43%--lprocfs_jobstats_seq_show
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;For the mdt kernel threads:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;--6.68%--mdt_reint_setattr
          |          
           --6.51%--mdt_counter_incr
                     |          
                      --6.50%--lprocfs_job_stats_log
                                |          
                                 --6.49%--_raw_write_lock
                                           |          
                                            --6.45%--__write_lock_failed
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;b&gt;with the patch&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;For the job stats read processes:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;--50.68%--seq_read
          |          
           --49.75%--lprocfs_jobstats_seq_show
                     |          
                     |--40.04%--seq_printf
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;lprocfs_jobstats_seq_start() is not visible because it does not walk the list or waits for read lock.&lt;/p&gt;

&lt;p&gt;The mdt kernel threads are not in the perf report: no contention on ojs_lock (in lprocfs_job_stats_log()).&lt;/p&gt;

&lt;p&gt;&lt;b&gt;job_stats dump performances&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Here is the performance results after the test (no fs activity) to dump job_stats procfile:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;time grep -c job_id /proc/fs/lustre/mdt/lustrefs-MDT0000/job_stats
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;instance&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;nbr of job&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;real time&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;sys time&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;rate&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;without patch&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;14749&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;1.3s&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;1.3s&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;11345 jobid/s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;with patch&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;22209&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0.6&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;0.6&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;37015 jobid/s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;Here is the comparison before and after the patch:&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;nbr of job&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;time&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;rate&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;+ 43%&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;&amp;#45; 54 %&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;+ 226 %&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;
</comment>
                            <comment id="369713" author="gerrit" created="Tue, 18 Apr 2023 03:23:41 +0000"  >&lt;p&gt;&quot;Oleg Drokin &amp;lt;green@whamcloud.com&amp;gt;&quot; merged in patch &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/50459/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/50459/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16674&quot; title=&quot;read contention on &amp;quot;job_stats&amp;quot; &amp;quot;/proc&amp;quot; file&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16674&quot;&gt;&lt;del&gt;LU-16674&lt;/del&gt;&lt;/a&gt; obdclass: optimize job_stats reads&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: master&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: c6890a955f89508db46fd8ffbf22b05b145976cd&lt;/p&gt;</comment>
                            <comment id="369752" author="pjones" created="Tue, 18 Apr 2023 12:15:11 +0000"  >&lt;p&gt;Landed for 2.16&lt;/p&gt;</comment>
                            <comment id="373210" author="gerrit" created="Tue, 23 May 2023 16:15:52 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51100&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51100&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16674&quot; title=&quot;read contention on &amp;quot;job_stats&amp;quot; &amp;quot;/proc&amp;quot; file&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16674&quot;&gt;&lt;del&gt;LU-16674&lt;/del&gt;&lt;/a&gt; obdclass: optimize job_stats reads&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_15&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 70013238a55efd6623be9e939f090ee718f9f5e0&lt;/p&gt;</comment>
                            <comment id="373281" author="gerrit" created="Wed, 24 May 2023 08:43:32 +0000"  >&lt;p&gt;&quot;Etienne AUJAMES &amp;lt;eaujames@ddn.com&amp;gt;&quot; uploaded a new patch: &lt;a href=&quot;https://review.whamcloud.com/c/fs/lustre-release/+/51111&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/c/fs/lustre-release/+/51111&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16674&quot; title=&quot;read contention on &amp;quot;job_stats&amp;quot; &amp;quot;/proc&amp;quot; file&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16674&quot;&gt;&lt;del&gt;LU-16674&lt;/del&gt;&lt;/a&gt; obdclass: optimize job_stats reads&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_12&lt;br/&gt;
Current Patch Set: 1&lt;br/&gt;
Commit: 7a00bdabe886e62c711944a12ff2a0c97fa51a96&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="48576" name="sanity_205g_cleanup_300_with_patch.report" size="13140" author="eaujames" created="Wed, 29 Mar 2023 11:10:22 +0000"/>
                            <attachment id="48577" name="sanity_205g_cleanup_300_with_patch.svg" size="462707" author="eaujames" created="Wed, 29 Mar 2023 11:10:23 +0000"/>
                            <attachment id="48575" name="sanity_205g_cleanup_300_without_patch.report" size="8796" author="eaujames" created="Wed, 29 Mar 2023 11:10:22 +0000"/>
                            <attachment id="48578" name="sanity_205g_cleanup_300_without_patch.svg" size="235832" author="eaujames" created="Wed, 29 Mar 2023 11:10:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i03hc7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>