<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 02:30:58 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-9977] client ran out of memory when diffing two 2GB files</title>
                <link>https://jira.whamcloud.com/browse/LU-9977</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;Systems were imaged to el7.3/lustre2.10.0.  A zfs mount point (v6.5.9) was &lt;br/&gt;
created on the lustre file system.  A 2GB file was then copied to a directory on &lt;br/&gt;
the zfs mount point.&lt;/p&gt;

&lt;p&gt;After systems were imaged to el7.4/2.10.52, an import of the 6.5.9 zpool was &lt;br/&gt;
performed.  A 2GB file was then copied onto the zfs mount point (same file as&lt;br/&gt;
above - different directory).  Diff was then used to compare the two files.&lt;/p&gt;

&lt;p&gt;While diff was running, top showed it consuming 80-90% of memory.  At some&lt;br/&gt;
point close to 90%, the client killed the diff process.&lt;/p&gt;


&lt;p&gt;I&apos;ve found two ways to avoid this: &lt;/p&gt;

&lt;p&gt;1) Keep everything above the same except work with a newly created zfs pool&lt;br/&gt;
   rather than an imported pool.&lt;/p&gt;

&lt;p&gt;2) Instead of diffing two 2GB files, diff two 2GB sets of several smaller &lt;br/&gt;
   files (largest file in set &amp;lt;60MB).&lt;/p&gt;


&lt;p&gt;Note: When diff is used to compare several smaller files, it uses much less &lt;br/&gt;
memory (&amp;lt;10%).&lt;/p&gt;

&lt;p&gt;Note: This has also been seen with ldiskfs, but is easier to repro with zfs.&lt;/p&gt;</description>
                <environment>clients: trevis-60vm1 &amp;amp; 2&lt;br/&gt;
mds: trevis-62&lt;br/&gt;
ost: trevis-65&lt;br/&gt;
&lt;br/&gt;
before upgrade: el7.3, zfs 6.5.9, b2_10 branch, v2.10.0, b5 &lt;br/&gt;
after upgrade: el7.4, zfs 7.1, master branch, v2.10.52, b3631 &lt;br/&gt;
</environment>
        <key id="48280">LU-9977</key>
            <summary>client ran out of memory when diffing two 2GB files</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="2" iconUrl="https://jira.whamcloud.com/images/icons/priorities/critical.svg">Critical</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="6">Not a Bug</resolution>
                                        <assignee username="bobijam">Zhenyu Xu</assignee>
                                    <reporter username="jcasper">James Casper</reporter>
                        <labels>
                    </labels>
                <created>Tue, 12 Sep 2017 16:00:30 +0000</created>
                <updated>Thu, 7 Dec 2017 09:44:40 +0000</updated>
                            <resolved>Thu, 7 Dec 2017 09:44:40 +0000</resolved>
                                    <version>Lustre 2.11.0</version>
                    <version>Lustre 2.10.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>4</watches>
                                                                            <comments>
                            <comment id="208143" author="casperjx" created="Tue, 12 Sep 2017 16:44:07 +0000"  >&lt;p&gt;My last successful clean/zfs upgrade was from 2.9.0 to 2.10.0 RC1.&lt;/p&gt;</comment>
                            <comment id="208151" author="pjones" created="Tue, 12 Sep 2017 17:44:59 +0000"  >&lt;p&gt;Nathaniel&lt;/p&gt;

&lt;p&gt;Could you please advise&lt;/p&gt;

&lt;p&gt;Thanks&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="208169" author="casperjx" created="Tue, 12 Sep 2017 20:01:39 +0000"  >&lt;p&gt;Just tried my diff again on an imported 6.5.9 zpool.  This time it was with 2.10.51 b3624, which also uses zfs 6.5.9 (and has an el7.3 kernel).  It killed the diff process again.  So this is not a zfs 7.1 issue.&lt;/p&gt;</comment>
                            <comment id="208220" author="adilger" created="Wed, 13 Sep 2017 06:20:13 +0000"  >&lt;p&gt;Could you include the stack traces from the console when the OOM was hit?  That should be recorded by conman. &lt;/p&gt;

&lt;p&gt;Also, for slabtop (and to a lesser extent top), the output should be sorted by total memory usage rather than number of objects (and CPU, respectively).  It looks like &quot;too&quot; is being charged for most of the memory usage (it shows in top as 89%), but it isn&apos;t clear what that memory is. &lt;/p&gt;

&lt;p&gt;It is definitely strange that ZFS has anything to do with this, because that is running on the OSS, and the memory used is on the client.  &lt;/p&gt;</comment>
                            <comment id="208286" author="casperjx" created="Wed, 13 Sep 2017 17:56:21 +0000"  >&lt;p&gt;Stack and new slabtop output attached.&lt;/p&gt;

&lt;p&gt;FYI: This is not zfs related.  I finally saw the same results with ldiskfs. &lt;/p&gt;</comment>
                            <comment id="208308" author="casperjx" created="Wed, 13 Sep 2017 20:05:05 +0000"  >&lt;p&gt;I was able to repro and get a crash dump.&lt;/p&gt;</comment>
                            <comment id="208403" author="casperjx" created="Thu, 14 Sep 2017 20:01:41 +0000"  >&lt;p&gt;I verified the output from top.  The memory usage during diff is dramatically different between reads of a couple 2GB files and reads of several smaller files (60MB or less, total read still ~4GB):&lt;/p&gt;

&lt;p&gt;two large files: ~85.0%&lt;br/&gt;
many small files: ~00.5%&lt;/p&gt;</comment>
                            <comment id="208521" author="pjones" created="Fri, 15 Sep 2017 17:32:50 +0000"  >&lt;p&gt;Bobijam&lt;/p&gt;

&lt;p&gt;Could you please advise on this? It seems to be a CLIO issue that has been introduced since 2.10&lt;/p&gt;

&lt;p&gt;Peter&lt;/p&gt;</comment>
                            <comment id="215392" author="casperjx" created="Tue, 5 Dec 2017 22:38:52 +0000"  >&lt;p&gt;Still seeing 80-90% of memory used during a diff of large files.  This is also happening on the latest b2_10 (2.10.2 RC1, b50).  The message sent to the console when the memory killer kicks in is &quot;diff: memory exhausted&quot;.&lt;/p&gt;</comment>
                            <comment id="215421" author="adilger" created="Wed, 6 Dec 2017 06:35:02 +0000"  >&lt;p&gt;Looking at the slabtop output it doesn&#8217;t seem like any slabs are taking up much space, the largest is tens of MB. If all the memory is accounts by diff, then either there is a significant memory leak in diff itself, or the file pages are being accounted against diff but cannot be released while the files are in use?  &lt;/p&gt;</comment>
                            <comment id="215489" author="casperjx" created="Wed, 6 Dec 2017 21:25:51 +0000"  >&lt;p&gt;I ran the diff again with the two directories on / rather than /mnt/lustre.  The same behavior was seen (diff: memory exhausted), so I believe this is not a lustre issue.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="46523">LU-9601</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="28270" name="before diff.JPG" size="152372" author="jcasper" created="Tue, 12 Sep 2017 16:00:04 +0000"/>
                            <attachment id="28271" name="during diff.JPG" size="174329" author="jcasper" created="Tue, 12 Sep 2017 16:00:02 +0000"/>
                            <attachment id="28280" name="slabtop and top screens during diff.JPG" size="164371" author="jcasper" created="Wed, 13 Sep 2017 17:53:59 +0000"/>
                            <attachment id="28281" name="stack for OOM during diff.txt" size="19148" author="jcasper" created="Wed, 13 Sep 2017 17:53:59 +0000"/>
                            <attachment id="28282" name="vmcore" size="38409627" author="jcasper" created="Wed, 13 Sep 2017 20:04:23 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzzk0f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>