<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:56:23 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-12872] Adding more stats into JOBSTATS</title>
                <link>https://jira.whamcloud.com/browse/LU-12872</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;JOBSTATS has been very useful to understand what type of IO is coming from application per JOBID/UID and GID, but it also would be nice to have more stats (e.g. like RPC size, discontiguous pages, etc that is covered by &quot;brw_stats&quot; today) into JOBSTATS to understand detail IO workload/size per JOBID.&lt;br/&gt;
The other hand, the hybrid system with SSD and HDD (with/without same namespace) is naturally coming and those new stats allow let administrator or users know more guideline/information e.g. which OST devices are preferred by each application.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12872&quot; title=&quot;Adding more stats into JOBSTATS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12872&quot;&gt;&lt;del&gt;LU-12872&lt;/del&gt;&lt;/a&gt; is used to track the requirement of adding more stats.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16087&quot; title=&quot;show distribution information of jobstats&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16087&quot;&gt;&lt;del&gt;LU-16087&lt;/del&gt;&lt;/a&gt; is used to track the requirement of show distribution information of jobstats.&lt;/p&gt;</description>
                <environment></environment>
        <key id="57177">LU-12872</key>
            <summary>Adding more stats into JOBSTATS</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="flei">Feng Lei </assignee>
                                    <reporter username="sihara">Shuichi Ihara</reporter>
                        <labels>
                    </labels>
                <created>Thu, 17 Oct 2019 07:39:34 +0000</created>
                <updated>Mon, 24 Apr 2023 18:57:42 +0000</updated>
                            <resolved>Mon, 24 Apr 2023 18:56:31 +0000</resolved>
                                                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="343794" author="gerrit" created="Wed, 17 Aug 2022 00:54:06 +0000"  >&lt;p&gt;&lt;del&gt;&quot;Feng, Lei &amp;lt;flei@whamcloud.com&amp;gt;&quot; uploaded a new patch:&lt;/del&gt; &lt;a href=&quot;https://review.whamcloud.com/48238&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://review.whamcloud.com/48238&lt;/a&gt;&lt;br/&gt;
&lt;del&gt;Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-12872&quot; title=&quot;Adding more stats into JOBSTATS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-12872&quot;&gt;&lt;del&gt;LU-12872&lt;/del&gt;&lt;/a&gt; lprocfs: add rpc nubmer and size to job stats&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Project: fs/lustre-release&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Branch: master&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Current Patch Set: 1&lt;/del&gt;&lt;br/&gt;
&lt;del&gt;Commit: 0b8b8d76a8072a58474fc6176850b331e4cb05f8&lt;/del&gt;&lt;/p&gt;</comment>
                            <comment id="343810" author="adilger" created="Wed, 17 Aug 2022 06:39:19 +0000"  >&lt;p&gt;Shuichi, can you please confirm, but I think the request here is to include the same or similar information from &lt;tt&gt;osd-ldiskfs.&lt;b&gt;.brw_stats&lt;/tt&gt; into the &lt;tt&gt;obdfilter.&lt;/b&gt;.job_stats&lt;/tt&gt;. My understanding is that the &quot;RPC size&quot; you requested is the bulk data size, and not the size of the RPCs themselves?&lt;/p&gt;

&lt;p&gt;Feng Lei, it is important to note that unlike some network filesystems, most Lustre bulk read/write RPCs sent from the clients do not contain the actual data, but only a description of the object being read/written and offsets and byte counts for each fragment in the request. Only in a few cases when the read/write request is very small is the data packed directly into the RPC. Normally, for larger RPCs (anything over 4KB), only when the server is processing a bulk read/write RPC it will set up the RDMA descriptors for the request, and then the IB HCA will transfer the data directly from client memory to server memory. That avoids the server memory being filled by the data for the requests that are queued, and avoids copying data from the network request buffers to the filesystem pages for IO.&lt;/p&gt;

&lt;p&gt;To fit into the &lt;tt&gt;job_stats&lt;/tt&gt; file, it would need to add several sub items for each of the job_stats entries:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;obdfilter.testfs-OST0001.job_stats=
job_stats:
- job_id:          grep.0
snapshot_time   : 4562769.380032450 secs.nsecs
start_time      : 4562769.053337605 secs.nsecs
elapsed_time    : 0.326694845 secs.nsecs
  read_bytes:      { samples:           5, unit: bytes, min:    32768, max:  4194304, sum:         16777216, sumsq:     70096013754368 }
  write_bytes:     { samples:           0, unit: bytes, min:        0, max:        0, sum:                0, sumsq:                  0 }
  read:            { samples:           5, unit: usecs, min:      136, max:     4072, sum:            11871, sumsq:           36286067 }
  write:           { samples:           0, unit: usecs, min:        0, max:        0, sum:                0, sumsq:                  0 }
:
 brw_stats:        {
     pages_per_bulk:   { 8: nnn 16: mmm 32: ppp ... }
     discontig_pages:  { }
     discontig_blocks: { }
     disk_fragments:   { }
     disk_io_inflight: { }
     disk_io_time:     { }
     disk_io_size:     { }
 }
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;I would like that this output is still reasonably easily read by humans, though it should also be properly parsable, unlike the current &lt;tt&gt;brw_stats&lt;/tt&gt;.&lt;/p&gt;

&lt;p&gt;However, Joe or someone more familiar with YAML parsing than I am should provide the actual layout so that the existing &lt;tt&gt;job_stats&lt;/tt&gt; parser does not explode if it sees these new fields.&#160; In &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-13123&quot; title=&quot;Add list of client NIDs to job_stats output&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-13123&quot;&gt;LU-13123&lt;/a&gt; I would also like to add a list of client NIDs that sent RPCs for this JobID so that it can be isolated to a specific clients (without the need to access an external job scheduler), so this should also be taken into consideration.&lt;/p&gt;</comment>
                            <comment id="343902" author="sihara" created="Thu, 18 Aug 2022 06:12:40 +0000"  >&lt;p&gt;Yes, we want to see RPC/IO size for bulk IO to OSTs in Jobstats (e.g. per JOBID, UID) even per NID would be useful since such stats doesn&apos;t exist in NID &quot;export&quot; stats today.&lt;/p&gt;</comment>
                            <comment id="344035" author="flei" created="Fri, 19 Aug 2022 01:31:59 +0000"  >&lt;p&gt;I&apos;m going to add:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
&lt;span class=&quot;code-keyword&quot;&gt;enum&lt;/span&gt; {
&#160; &#160; LPROCFS_CNTR_EXTERNALLOCK &#160; &#160;= 0x0001,
&#160; &#160; LPROCFS_CNTR_AVGMINMAX &#160; &#160; &#160;= 0x0002,
&#160; &#160; LPROCFS_CNTR_STDDEV &#160; &#160; &#160; &#160;= 0x0004,
      LPROCFS_CNTR_HISTGRAM          = 0x0008,

      ...
      LPROCFS_CNTR_RPC_READ_PAGES    = LPROCFS_TYPE_PAGES | LPROCFS_CNTR_HISTGRAM,&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;The output may be like:&lt;/p&gt;
&lt;div class=&quot;code panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;codeContent panelContent&quot;&gt;
&lt;pre class=&quot;code-java&quot;&gt;
  rpc_read_pages: { samples: 0, unit: pages, min: 0, max: 0, sum: 0, sumsq: 0, histgram: {1: xxx, 2: yyy, 4: zzz, ...} } &lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=adilger&quot; class=&quot;user-hover&quot; rel=&quot;adilger&quot;&gt;adilger&lt;/a&gt; &lt;a href=&quot;https://jira.whamcloud.com/secure/ViewProfile.jspa?name=sihara&quot; class=&quot;user-hover&quot; rel=&quot;sihara&quot;&gt;sihara&lt;/a&gt; Please feel free to comment.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Move this topic to &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16087&quot; title=&quot;show distribution information of jobstats&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16087&quot;&gt;&lt;del&gt;LU-16087&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="344203" author="adilger" created="Mon, 22 Aug 2022 06:54:40 +0000"  >&lt;p&gt;One thing I realized is that job_stats are tracked at the ofd level, while brw_stats are tracked at the osd level because they contain on-disk allocation information. &lt;/p&gt;

&lt;p&gt;That means the current job_stats file could add the RPC pages_per_bulk, latency, and discontiguous_pages histograms, but not discontiguous disk blocks ir disk IO size. I think that is probably OK, since we can check the disk fragmentation from the main brw_stats (these should not be specific to the application.&lt;/p&gt;</comment>
                            <comment id="370378" author="adilger" created="Mon, 24 Apr 2023 18:56:31 +0000"  >&lt;p&gt;Was handled by &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-16087&quot; title=&quot;show distribution information of jobstats&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-16087&quot;&gt;&lt;del&gt;LU-16087&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="57775">LU-13123</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="53365">LU-11407</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="59312">LU-13597</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="71814">LU-16087</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i00o6f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>