<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:53:31 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5674] Maloo test report should include zfs debugging data when when FSTYPE=zfs</title>
                <link>https://jira.whamcloud.com/browse/LU-5674</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;If I haven&apos;t missed something, zfs debugging data hasn&apos;t been included in test reports, e.g.:&lt;br/&gt;
&lt;a href=&quot;https://maloo.whamcloud.com/test_sets/42573266-9f17-11e3-934b-52540035b04c&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://maloo.whamcloud.com/test_sets/42573266-9f17-11e3-934b-52540035b04c&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It&apos;d be very useful to have a tarball of /proc/spl/. Lots of useful data to troubleshoot ZFS problems can be found under that directory, e.g. dmu_tx_assign delay histogram.&lt;/p&gt;</description>
                <environment></environment>
        <key id="23645">LU-5674</key>
            <summary>Maloo test report should include zfs debugging data when when FSTYPE=zfs</summary>
                <type id="4" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11310&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="1">Fixed</resolution>
                                        <assignee username="mdiep">Minh Diep</assignee>
                                    <reporter username="isaac">Isaac Huang</reporter>
                        <labels>
                            <label>prz</label>
                            <label>triaged</label>
                    </labels>
                <created>Fri, 14 Mar 2014 21:31:47 +0000</created>
                <updated>Wed, 16 Mar 2016 13:37:03 +0000</updated>
                            <resolved>Wed, 15 Jul 2015 15:41:18 +0000</resolved>
                                                    <fixVersion>Lustre 2.7.0</fixVersion>
                    <fixVersion>Lustre 2.5.4</fixVersion>
                                        <due></due>
                            <votes>0</votes>
                                    <watches>7</watches>
                                                                            <comments>
                            <comment id="80257" author="isaac" created="Tue, 25 Mar 2014 22:02:53 +0000"  >&lt;p&gt;Some of the test failures (e.g. &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4716&quot; title=&quot;replay-ost-single test_5: stuck in dbuf_read-&amp;gt;zio_wait&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4716&quot;&gt;&lt;del&gt;LU-4716&lt;/del&gt;&lt;/a&gt;) we&apos;ve seen could be the known ZFS IO starvation issue with the Linux cfq scheduler. Although ZFS automatically set IO scheduler to noop on whole disks, the host OS could still be using cfq for the disks behind the guess OS disks. It seemed that ZFS pools on OSS were set up:&lt;br/&gt;
zpool import -f -o cachefile=none -d /dev/lvm-Role_OSS lustre-ost3&lt;br/&gt;
It was not clear to me how devices under /dev/lvm-Role_OSS were setup and used. I think it makes sense to make sure that our test system:&lt;br/&gt;
1. Use whole disks for zfs pools on guest VMs.&lt;br/&gt;
2. Use noop IO scheduler for corresponding disks on host OS.&lt;/p&gt;</comment>
                            <comment id="82757" author="mjs" created="Tue, 29 Apr 2014 17:05:04 +0000"  >&lt;p&gt;I would like to understand this a little better.&lt;/p&gt;

&lt;p&gt;At the end of an autotest run with ZFS autotest could just include the content of /proc/spl/... in the tar ball it sends over to maloo, and maloo could make it available to download. (this is how I read the original request - Mike) &lt;b&gt;How much data is involved, and would that data in isolation be useful?&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;Would this be sufficient for a start, or do we need to include more information (for example the kernel which was running)? &lt;/p&gt;</comment>
                            <comment id="83319" author="isaac" created="Tue, 6 May 2014 16:30:15 +0000"  >&lt;p&gt;Yes, a tgz of /proc/spl/ should be sufficient. And it&apos;s necessary only for failed tests. Everything under /proc/spl/ is text, when compressed the total size shouldn&apos;t be large.&lt;/p&gt;</comment>
                            <comment id="91781" author="mdiep" created="Fri, 15 Aug 2014 20:48:00 +0000"  >&lt;p&gt;My take on this is we should start this in Lustre test-framework and generate such file under ZFS tests. After that we can see if autotest automatically grab the files send to maloo.&lt;/p&gt;</comment>
                            <comment id="93343" author="isaac" created="Fri, 5 Sep 2014 17:14:45 +0000"  >&lt;p&gt;Another piece of vital debug information to collect is outputs from &quot;zpool events -v&quot;, available since zfs 0.6.3.&lt;/p&gt;

&lt;p&gt;Recently in debugging a few ZFS issues (e.g. &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4950&quot; title=&quot;sanity-benchmark test fsx hung: txg_sync was stuck on OSS&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4950&quot;&gt;&lt;del&gt;LU-4950&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5242&quot; title=&quot;Test hang sanity test_132, test_133: umount ost&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5242&quot;&gt;&lt;del&gt;LU-5242&lt;/del&gt;&lt;/a&gt;) I found it very difficult to troubleshoot without such information - basically the only thing I had was just some stack dumps.&lt;/p&gt;</comment>
                            <comment id="96056" author="isaac" created="Thu, 9 Oct 2014 17:42:41 +0000"  >&lt;p&gt;Please also set ZFS module option on MDS and OSS: zfs_txg_history=3&lt;/p&gt;

&lt;p&gt;Without the option, some debug information will not be exported in the proc file.&lt;/p&gt;</comment>
                            <comment id="96458" author="mdiep" created="Wed, 15 Oct 2014 23:51:09 +0000"  >&lt;p&gt;&lt;a href=&quot;http://review.whamcloud.com/#/c/11580/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/11580/&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="97924" author="pjones" created="Thu, 30 Oct 2014 11:54:02 +0000"  >&lt;p&gt;Landed for 2.7&lt;/p&gt;</comment>
                            <comment id="98460" author="yujian" created="Wed, 5 Nov 2014 20:01:06 +0000"  >&lt;p&gt;Here is the back-ported patch for Lustre b2_5 branch: &lt;a href=&quot;http://review.whamcloud.com/12590&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12590&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="100338" author="isaac" created="Mon, 1 Dec 2014 19:07:01 +0000"  >&lt;p&gt;Looks like ZFS info is missing from this report:&lt;br/&gt;
&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/9e3a7c26-769b-11e4-ad19-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/9e3a7c26-769b-11e4-ad19-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Or have I missed something?&lt;/p&gt;</comment>
                            <comment id="100355" author="mdiep" created="Mon, 1 Dec 2014 20:57:58 +0000"  >&lt;p&gt;the test that you mentioned doesn&apos;t follow the test-framework way of start test. this results in zfs log was not called. additional, the test timed out which could also mean that the log would not be collect at the end of the client crashed.&lt;/p&gt;</comment>
                            <comment id="100362" author="isaac" created="Mon, 1 Dec 2014 22:17:42 +0000"  >&lt;p&gt;Did you mean that even if the test I mentioned had failed a different way (i.e. not a timeout, so it&apos;d be possible to collect to logs) the zfs logs would still not be collected? If yes, does it apply to all Maloo tests triggered from Gerrit?&lt;/p&gt;</comment>
                            <comment id="100376" author="mdiep" created="Tue, 2 Dec 2014 00:15:58 +0000"  >&lt;p&gt;sorry for not being clear. after looking at this I think it likely that the test timed out caused the zfs log to not collected.&lt;/p&gt;

&lt;p&gt;Please if you find a case where a test failed but not log, please open a new ticket instead of reopen this. I believe this enhancement is completed.&lt;/p&gt;</comment>
                            <comment id="100380" author="isaac" created="Tue, 2 Dec 2014 01:10:21 +0000"  >&lt;p&gt;When you said &quot;the test timed out caused the zfs log to not collected&quot;, did you mean:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;on test time out, the scripts would not try to collect the ZFS logs, or&lt;/li&gt;
	&lt;li&gt;it tries to get the logs, but servers wouldn&apos;t respond&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;In the report I mentioned above, the OSS was in good state; there was a deadlock on the MDS, which made some service threads unresponsive, but user space process should still work. In addition, dmesg and Lustre debug logs were all available for both the OSS and the MDS, then why wasn&apos;t the ZFS logs available as well?&lt;/p&gt;</comment>
                            <comment id="103556" author="gerrit" created="Thu, 15 Jan 2015 04:45:14 +0000"  >&lt;p&gt;Oleg Drokin (oleg.drokin@intel.com) merged in patch &lt;a href=&quot;http://review.whamcloud.com/12590/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/12590/&lt;/a&gt;&lt;br/&gt;
Subject: &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5674&quot; title=&quot;Maloo test report should include zfs debugging data when when FSTYPE=zfs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5674&quot;&gt;&lt;del&gt;LU-5674&lt;/del&gt;&lt;/a&gt; test: print spl debug info&lt;br/&gt;
Project: fs/lustre-release&lt;br/&gt;
Branch: b2_5&lt;br/&gt;
Current Patch Set: &lt;br/&gt;
Commit: f3ecfa69ecbfaa3e28b50c2849ffc99ca6bebf6a&lt;/p&gt;</comment>
                            <comment id="103703" author="isaac" created="Fri, 16 Jan 2015 02:17:07 +0000"  >&lt;p&gt;&lt;a href=&quot;https://testing.hpdd.intel.com/test_sets/deca9712-9bc1-11e4-857a-5254006e85c2&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://testing.hpdd.intel.com/test_sets/deca9712-9bc1-11e4-857a-5254006e85c2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the test report above, I couldn&apos;t find any ZFS data requested here. Since dmesg and other data that&apos;d require a working user space were all there, I&apos;d believe that the ZFS data should be available as well. Please take a look - the missing of such data made it harder to debug. Thanks!&lt;/p&gt;</comment>
                            <comment id="117292" author="mdiep" created="Wed, 3 Jun 2015 15:39:39 +0000"  >&lt;p&gt;since the test has timed out, there isn&apos;t any way to collect the zfs log&lt;/p&gt;</comment>
                            <comment id="121350" author="mdiep" created="Wed, 15 Jul 2015 15:40:35 +0000"  >&lt;p&gt;this ticket already in 2.7..etc. we can not include any log if a node is crashed (ie timeout). this ticket should be closed.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzupy7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1854</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                </customfields>
    </item>
</channel>
</rss>