<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:32:42 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-3300] Restore missing proc information for LMT</title>
                <link>https://jira.whamcloud.com/browse/LU-3300</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Lustre&apos;s proc seems to have had a number of regressions.  LMT&apos;s ltop is no longer able to find many of the values it used to display.&lt;/p&gt;

&lt;p&gt;In particular, brw_stats from obdfilter is gone, and does not appear to have been replaced after the OSD work.  At minimum, that was used by ltop to report the number of bulk rpcs handled.&lt;/p&gt;

&lt;p&gt;The MDS display is also missing a number of values.&lt;/p&gt;

&lt;p&gt;We don&apos;t necessarily need to put them back exactly how they were before, but we need to export them in some way that will make them usable for folks.&lt;/p&gt;

&lt;p&gt;It would be best to decide on interfaces before 2.4.0 is locked in.&lt;/p&gt;</description>
                <environment></environment>
        <key id="18760">LU-3300</key>
            <summary>Restore missing proc information for LMT</summary>
                <type id="3" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11318&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.whamcloud.com/images/icons/priorities/major.svg">Major</priority>
                        <status id="1" iconUrl="https://jira.whamcloud.com/images/icons/statuses/open.png" description="The issue is open and ready for the assignee to start work on it.">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="emoly.liu">Emoly Liu</assignee>
                                    <reporter username="morrone">Christopher Morrone</reporter>
                        <labels>
                    </labels>
                <created>Wed, 8 May 2013 21:49:57 +0000</created>
                <updated>Tue, 7 Jun 2016 15:38:01 +0000</updated>
                                            <version>Lustre 2.4.0</version>
                                    <fixVersion>Lustre 2.4.0</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>10</watches>
                                                                            <comments>
                            <comment id="57954" author="rread" created="Wed, 8 May 2013 22:54:57 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3296&quot; title=&quot;fs/lustre/mdt/*/md_stats not showing any stats after remount&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3296&quot;&gt;&lt;del&gt;LU-3296&lt;/del&gt;&lt;/a&gt; is another issue related to disappearing md_stats.&lt;/p&gt;

&lt;p&gt;I do see brw_stats in obdfilter, but it&apos;s a symlink to osd-ldiskfs/*/brw_stats.&lt;/p&gt;



</comment>
                            <comment id="57955" author="adilger" created="Wed, 8 May 2013 23:07:53 +0000"  >&lt;p&gt;Chris, the brw_stats symlink was recently added back in &lt;a href=&quot;http://review.whamcloud.com/5873&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/5873&lt;/a&gt; (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-3106&quot; title=&quot;create symlinks in procfs from ofd to osd&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-3106&quot;&gt;&lt;del&gt;LU-3106&lt;/del&gt;&lt;/a&gt;), so maybe that isn&apos;t in your tree yet?&lt;/p&gt;

&lt;p&gt;There is also a separate patch &lt;a href=&quot;http://review.whamcloud.com/4618&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/4618&lt;/a&gt; (&lt;a href=&quot;https://jira.whamcloud.com/browse/LU-2096&quot; title=&quot;name ofd device type &amp;quot;ofd&amp;quot; instead of &amp;quot;obdfilter&amp;quot;&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-2096&quot;&gt;LU-2096&lt;/a&gt;), which adds symlinks from obdfilter/ to ofd/ for some of the other tunables so that the names are cleaned up.  I haven&apos;t refreshed that patch lately since it isn&apos;t really a 2.4.0 priority.&lt;/p&gt;</comment>
                            <comment id="57959" author="jhammond" created="Wed, 8 May 2013 23:35:35 +0000"  >&lt;p&gt;Under ldiskfs Lustre code had driect access to the block device request queue, making brw_stats possible, as they are really just enhanced block device stats. With zfs backends there is no such access so no brw_stats.&lt;/p&gt;

&lt;p&gt;During proc init for the ofd device, if the underlying osd has a brw_stats file in proc we create a symlink from the obdfilter/*/ directory. If the underlying osd device does not then there will be no symlink.&lt;/p&gt;

&lt;p&gt;There remain stats in /proc/fs/lustre/obdfilter/lustre-OST0000/stats for bulk transfers.&lt;/p&gt;

&lt;p&gt;Which MDS/MDT values are you missing?&lt;/p&gt;</comment>
                            <comment id="57961" author="morrone" created="Wed, 8 May 2013 23:49:43 +0000"  >&lt;p&gt;Ah, that explains why I couldn&apos;t find brw_stats.  That is a problem.&lt;/p&gt;

&lt;p&gt;For MDS I won&apos;t have time to figure it out before I disappear on vacation.  But some of the values never show anything but zeroes.  Robert&apos;s pointer about md_stats could be the problem.&lt;/p&gt;

&lt;p&gt;Another missing item is the per-client brw_stats on servers.  That used to be our admins&apos; main method of determining who was overloading a server when server loads went through the roof.  What do we do to handle that now?&lt;/p&gt;
</comment>
                            <comment id="57966" author="adilger" created="Thu, 9 May 2013 00:18:55 +0000"  >&lt;p&gt;Doh, forgot about the lack of brw_stats for ZFS.  Alex&apos;s patch is only helping if brw_stats is available in the first place.&lt;/p&gt;

&lt;p&gt;Many of the disk brw_stats might be available on an aggregate basis, if the plumbing is available in ZFS.  The per-client and per-job brw_stats is much more tricky because the ZFS IO is not allocated or submitted to disk until long after the service thread has completed processing the request.  &lt;/p&gt;

&lt;p&gt;The RPC information like &quot;pages per bulk r/w&quot; and &quot;discontiguous pages&quot; could be available independent of the OSD type.  These should really be OFD statistics, maybe in a new &quot;rpc_stats&quot; file, possibly in YAML format?  These could also be available on a per-client or per-request basis.  This might be enough for your debugging purposes?&lt;/p&gt;

&lt;p&gt;Information like &quot;disk I/Os in flight&quot;, &quot;I/O time&quot;, and &quot;disk I/O size&quot; might be gotten at an aggregate basis from ZFS.  The &quot;discontiguous blocks&quot; and &quot;disk fragmented I/Os&quot; would be much harder to collect for writes, without deep hooks into the ZFS IO scheduler.  Some of this information could be extracted for reads, by hacking into the ZFS block pointers to get the physical disk blocks.&lt;/p&gt;

&lt;p&gt;As for getting this into 2.4.0, I don&apos;t think that is very likely, since we are very close to making an RC1 tag.  I don&apos;t think it would be unreasonable to call this a regression and fix it for 2.4.1 if it can be done cleanly.&lt;/p&gt;</comment>
                            <comment id="57969" author="morrone" created="Thu, 9 May 2013 00:38:45 +0000"  >&lt;p&gt;I don&apos;t really want to fully recreate brw_stats as it exists now.  But we need a consistent way to get the same information form both ldiskfs and zfs osds, that lets us fully fill in the information that ltop has always presented.&lt;/p&gt;

&lt;p&gt;Proc information is important to your customers.  Proc has regressed, and is incomplete.  I think that means that lustre isn&apos;t ready for RCs yet.&lt;/p&gt;</comment>
                            <comment id="57978" author="jhammond" created="Thu, 9 May 2013 02:01:33 +0000"  >&lt;p&gt;The obdfilter stats should provide a good view of utilization on a global and per client level, regardless of the backend. I have personally used them to this effect.&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# cat /proc/fs/lustre/obdfilter/lustre-OST0000/stats
snapshot_time             1368063951.465558 secs.usecs
read_bytes                7 samples [bytes] 4096 1048576 5251072
write_bytes               1287 samples [bytes] 4096 4096 5271552
get_info                  8 samples [reqs]
connect                   1 samples [reqs]
reconnect                 4 samples [reqs]
disconnect                1 samples [reqs]
statfs                    9278 samples [reqs]
create                    4 samples [reqs]
destroy                   3 samples [reqs]
sync                      1282 samples [reqs]
preprw                    1294 samples [reqs]
commitrw                  1294 samples [reqs]
ping                      9588 samples [reqs]
# cat /proc/fs/lustre/obdfilter/lustre-OST0000/exports/0@lo/stats
snapshot_time             1368063954.225882 secs.usecs
read_bytes                7 samples [bytes] 4096 1048576 5251072
write_bytes               1287 samples [bytes] 4096 4096 5271552
get_info                  8 samples [reqs]
disconnect                1 samples [reqs]u 
create                    4 samples [reqs]
destroy                   3 samples [reqs]
sync                      1282 samples [reqs]
preprw                    1294 samples [reqs]
commitrw                  1294 samples [reqs]
ping                      9591 samples [reqs]
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If something is missing then please say so. Not that I can guarantee anything for 2.4, but I would like to know.&lt;/p&gt;

&lt;p&gt;As an aside, if LLNL/LMT/you depend on some aspect of proc then it would be good to have some sanity tests to verify that it doesn&apos;t go away, along a comment in the test to the effect that LLNL/LMT/you will be unhappy if it does. I finally get to say that to somebody, rather than have it said to me. I thought I would enjoy it more. Weird.&lt;/p&gt;</comment>
                            <comment id="58175" author="adilger" created="Fri, 10 May 2013 18:44:56 +0000"  >&lt;p&gt;Chris, does John&apos;s proposal for using the obdfilter &quot;stats&quot; data address your needs for LMT?  This would be a way for LMT to get aggregate IO and per-client IO stats that works on both 2.1 and 2.4.&lt;/p&gt;

&lt;p&gt;It would also be possible to create the brw_stats for ZFS with just the RPC (&quot;page&quot;) information to start with, but based on your comment I don&apos;t know if this is what you want.  It isn&apos;t clear to me if it will be possible to add the ZFS block IO stats later or not.  Would just having the &quot;page&quot; information in ZFS brw_stats be useful?  This would allow the admins to at least see whether the clients are submitting poorly formed RPCs.  I don&apos;t think it would be too hard to do just that part.&lt;/p&gt;</comment>
                            <comment id="58402" author="pjones" created="Mon, 13 May 2013 21:58:50 +0000"  >&lt;p&gt;I discussed this with Marc Stearman today. This does not need to hold the release but it will be a support priority to find ways to enable LLNL sysadmins to perform common tasks. Marc will provide a prioritized list of those that matter most to LLNL.&lt;/p&gt;</comment>
                            <comment id="58734" author="rread" created="Fri, 17 May 2013 04:28:59 +0000"  >&lt;p&gt;I noticed with recent 2.4 builds that lmt is failing to capture metrics on the MDS because several files are empty, however this worked in the 2.4.63 builds that I was using for my LUG testing, so this is a recent regression.  These files are in both the lod and osd-ldiskfs directories and they&apos;re empty in both:&lt;/p&gt;

&lt;blockquote&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;ec2-user@mds0 ~&amp;#93;&lt;/span&gt;$ head  /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/*&lt;br/&gt;
==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/activeobd &amp;lt;==&lt;br/&gt;
8&lt;/p&gt;

&lt;p&gt;==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/blocksize &amp;lt;==&lt;/p&gt;

&lt;p&gt;==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/desc_uuid &amp;lt;==&lt;br/&gt;
scratch-MDT0000-mdtlov_UUID&lt;/p&gt;

&lt;p&gt;==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/filesfree &amp;lt;==&lt;/p&gt;

&lt;p&gt;==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/filestotal &amp;lt;==&lt;/p&gt;

&lt;p&gt;==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/kbytesavail &amp;lt;==&lt;/p&gt;

&lt;p&gt;==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/kbytesfree &amp;lt;==&lt;/p&gt;

&lt;p&gt;==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/kbytestotal &amp;lt;==&lt;/p&gt;

&lt;p&gt;==&amp;gt; /proc/fs/lustre/lod/scratch-MDT0000-mdtlov/numobd &amp;lt;==&lt;br/&gt;
8&lt;/p&gt;&lt;/blockquote&gt;
</comment>
                            <comment id="58836" author="jhammond" created="Sat, 18 May 2013 20:36:23 +0000"  >&lt;p&gt;Robert, I broke the osd statfs handlers. A patch is forthcoming.&lt;/p&gt;</comment>
                            <comment id="58837" author="jhammond" created="Sat, 18 May 2013 20:55:59 +0000"  >&lt;p&gt;Please see &lt;a href=&quot;http://review.whamcloud.com/6385&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/6385&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="59099" author="jhammond" created="Wed, 22 May 2013 19:05:55 +0000"  >&lt;p&gt;The patch for the osd statfs proc handlers has landed to master. But I&apos;m leaving this ticket open waiting for a response from LLNL.&lt;/p&gt;</comment>
                            <comment id="59137" author="rread" created="Thu, 23 May 2013 03:08:30 +0000"  >&lt;p&gt;With that patch landed I&apos;m getting metadata in LMT again, thanks!  Still waiting to hear from LLNL...&lt;/p&gt;</comment>
                            <comment id="61404" author="jhammond" created="Wed, 26 Jun 2013 18:40:12 +0000"  >&lt;p&gt;Chris, any updates here? We were waiting on a response from LLNL.&lt;/p&gt;</comment>
                            <comment id="61412" author="morrone" created="Wed, 26 Jun 2013 20:30:15 +0000"  >&lt;p&gt;The problem that you fixed we hadn&apos;t even seen yet.  We did then see it. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;  Now we don&apos;t.&lt;/p&gt;

&lt;p&gt;So now brw_stats, or something equivalent just needs to be added, I believe.&lt;/p&gt;</comment>
                            <comment id="65432" author="pjones" created="Fri, 30 Aug 2013 12:53:27 +0000"  >&lt;p&gt;Emoly&lt;/p&gt;

&lt;p&gt;Could you please see what work remains on this ticket?&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="66709" author="emoly.liu" created="Mon, 16 Sep 2013 09:16:01 +0000"  >&lt;p&gt;Chris, I add a symlink of osd brw_stats to lod. Hope that can meet your requirement.&lt;br/&gt;
Please see &lt;a href=&quot;http://review.whamcloud.com/7663&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/7663&lt;/a&gt; .&lt;/p&gt;</comment>
                            <comment id="144097" author="emoly.liu" created="Mon, 29 Feb 2016 07:31:15 +0000"  >&lt;p&gt;Chris, is there any more work you would like us to do for this ticket? Or can we mark it as resolved? Thanks.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="18242">LU-3106</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="16785">LU-2396</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="18753">LU-3296</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="16263">LU-2096</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="22050">LU-4259</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="19054">LU-3355</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                            <subtask id="22050">LU-4259</subtask>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10490" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 29 Feb 2016 21:49:57 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzvqh3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8169</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10493" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 8 May 2013 21:49:57 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    </customfields>
    </item>
</channel>
</rss>