<!-- 
RSS generated by JIRA (9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c) at Sat Feb 10 01:49:30 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>Whamcloud Community JIRA</title>
    <link>https://jira.whamcloud.com</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.4.14</version>
        <build-number>940014</build-number>
        <build-date>05-12-2023</build-date>
    </build-info>


<item>
            <title>[LU-5212] poor stat performance after upgrade from zfs-0.6.2-1/lustre-2.4.0-1 to zfs-0.6.3-1</title>
                <link>https://jira.whamcloud.com/browse/LU-5212</link>
                <project id="10000" key="LU">Lustre</project>
                    <description>&lt;p&gt;After upgrading a system from &lt;br/&gt;
lustre 2.4.0-1 / zfs-0.6.2-1 to&lt;br/&gt;
lustre 2.4.2-1 / zfs-0.6.3-1&lt;/p&gt;

&lt;p&gt;mdtest shows a signficantly lower stat performance - about 8000 iops vs 14400&lt;/p&gt;

&lt;p&gt;File reads and file removals are a bit worse, but not as severe. See the attached graph.&lt;/p&gt;

&lt;p&gt;We do see other marked improvements with the upgrade, for example with system processes waiting on the MDS.&lt;/p&gt;

&lt;p&gt;I wonder if this is some kind of expected perfromance tradeoff for the new version? I&apos;m guessing the absolute numbers for stat are still acceptable for our workload, but it is quite a large relative difference.&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</description>
                <environment>centos 6.5</environment>
        <key id="25185">LU-5212</key>
            <summary>poor stat performance after upgrade from zfs-0.6.2-1/lustre-2.4.0-1 to zfs-0.6.3-1</summary>
                <type id="1" iconUrl="https://jira.whamcloud.com/secure/viewavatar?size=xsmall&amp;avatarId=11303&amp;avatarType=issuetype">Bug</type>
                                            <priority id="4" iconUrl="https://jira.whamcloud.com/images/icons/priorities/minor.svg">Minor</priority>
                        <status id="5" iconUrl="https://jira.whamcloud.com/images/icons/statuses/resolved.png" description="A resolution has been taken, and it is awaiting verification by reporter. From here issues are either reopened, or are closed.">Resolved</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="2">Won&apos;t Fix</resolution>
                                        <assignee username="wc-triage">WC Triage</assignee>
                                    <reporter username="sknolin">Scott Nolin</reporter>
                        <labels>
                            <label>llnl</label>
                            <label>prz</label>
                            <label>zfs</label>
                    </labels>
                <created>Tue, 17 Jun 2014 15:49:26 +0000</created>
                <updated>Thu, 5 Apr 2018 14:03:56 +0000</updated>
                            <resolved>Thu, 5 Apr 2018 14:03:56 +0000</resolved>
                                    <version>Lustre 2.4.2</version>
                                                        <due></due>
                            <votes>0</votes>
                                    <watches>8</watches>
                                                                            <comments>
                            <comment id="86810" author="gabriele.paciucci" created="Tue, 17 Jun 2014 16:10:16 +0000"  >&lt;p&gt;Hi Scott,&lt;br/&gt;
Can you confirm lustre 2.4.2 and zfs 0.6.3?&lt;/p&gt;

&lt;p&gt;Have you seen this patch &lt;a href=&quot;https://jira.hpdd.intel.com/browse/LU-4944&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/browse/LU-4944&lt;/a&gt; ?&lt;/p&gt;
</comment>
                            <comment id="86812" author="gabriele.paciucci" created="Tue, 17 Jun 2014 16:15:42 +0000"  >&lt;p&gt;Can you collect please the /proc/spl/kstat/zfs/arcstats during the benchmark and upload a graph of:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;arc_meta_limit&lt;/li&gt;
	&lt;li&gt;arc_meta_size&lt;/li&gt;
	&lt;li&gt;c_max&lt;/li&gt;
	&lt;li&gt;c_min&lt;/li&gt;
	&lt;li&gt;p&lt;/li&gt;
	&lt;li&gt;size&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;thanks&lt;/p&gt;</comment>
                            <comment id="86815" author="sknolin" created="Tue, 17 Jun 2014 16:45:17 +0000"  >&lt;p&gt;This is lustre 2.4.2 and zfs 0.6.3 from the ZFS on Linux EPEL repository&lt;/p&gt;

&lt;p&gt;I&apos;m not certain I am following this correctly, but the comments on &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-4944&quot; title=&quot;build fails with latest zfs source&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-4944&quot;&gt;&lt;del&gt;LU-4944&lt;/del&gt;&lt;/a&gt; suggest this patch is already in our zfs version&lt;/p&gt;

&lt;p&gt;I&apos;m not sure when I can run the benchmark again to collect stats while running. Here&apos;s limit values at least&lt;/p&gt;

&lt;p&gt;arc_meta_limit                 10000000000&lt;br/&gt;
arc_meta_max                 10000274368&lt;br/&gt;
c_min                           4194304&lt;br/&gt;
c_max                          150000000000&lt;/p&gt;

</comment>
                            <comment id="86817" author="sknolin" created="Tue, 17 Jun 2014 17:01:12 +0000"  >&lt;p&gt;When I do get to run this it will be easier if I can just use arcstat.py to collect data. If I do that, which of these fields do you want to see on a graph?&lt;/p&gt;

&lt;p&gt;Field definitions are as follows:&lt;br/&gt;
    l2bytes : bytes read per second from the L2ARC&lt;br/&gt;
     l2hits : L2ARC hits per second&lt;br/&gt;
       read : Total ARC accesses per second&lt;br/&gt;
       dmis : Demand Data misses per second&lt;br/&gt;
        mru : MRU List hits per second&lt;br/&gt;
      dread : Demand data accesses per second&lt;br/&gt;
      mread : Metadata accesses per second&lt;br/&gt;
          c : ARC Target Size&lt;br/&gt;
        ph% : Prefetch hits percentage&lt;br/&gt;
     l2hit% : L2ARC access hit percentage&lt;br/&gt;
        pm% : Prefetch miss percentage&lt;br/&gt;
        mfu : MFU List hits per second&lt;br/&gt;
        mm% : Metadata miss percentage&lt;br/&gt;
      pread : Prefetch accesses per second&lt;br/&gt;
       miss : ARC misses per second&lt;br/&gt;
       mrug : MRU Ghost List hits per second&lt;br/&gt;
       dhit : Demand Data hits per second&lt;br/&gt;
       mfug : MFU Ghost List hits per second&lt;br/&gt;
       hits : ARC reads per second&lt;br/&gt;
        dm% : Demand Data miss percentage&lt;br/&gt;
      miss% : ARC miss percentage&lt;br/&gt;
       mhit : Metadata hits per second&lt;br/&gt;
        dh% : Demand Data hit percentage&lt;br/&gt;
        mh% : Metadata hit percentage&lt;br/&gt;
       pmis : Prefetch misses per second&lt;br/&gt;
    l2asize : Actual (compressed) size of the L2ARC&lt;br/&gt;
    l2miss% : L2ARC access miss percentage&lt;br/&gt;
     l2miss : L2ARC misses per second&lt;br/&gt;
       mmis : Metadata misses per second&lt;br/&gt;
       phit : Prefetch hits per second&lt;br/&gt;
       hit% : ARC Hit percentage&lt;br/&gt;
      eskip : evict_skip per second&lt;br/&gt;
      arcsz : ARC Size&lt;br/&gt;
       time : Time&lt;br/&gt;
     l2read : Total L2ARC accesses per second&lt;br/&gt;
     l2size : Size of the L2ARC&lt;br/&gt;
     mtxmis : mutex_miss per second&lt;br/&gt;
       rmis : recycle_miss per second&lt;/p&gt;</comment>
                            <comment id="86842" author="gabriele.paciucci" created="Tue, 17 Jun 2014 19:40:16 +0000"  >&lt;p&gt;Hi Scott,&lt;br/&gt;
the interesting thing is to collect arcstats every for example 5 seconds and create a graph of the values using for example gnuplot during the test. arcstats.py is more to understand the cache hit/miss.&lt;br/&gt;
In my experience your problem is more on optimizing the arc_meta_limit size, but to validate this, I need to see the memory consumption during the run.&lt;/p&gt;</comment>
                            <comment id="86843" author="gabriele.paciucci" created="Tue, 17 Jun 2014 19:41:40 +0000"  >&lt;p&gt;Have you compiled lustre 2.4.2 against ZFS 0.6.3?&lt;/p&gt;</comment>
                            <comment id="86854" author="sknolin" created="Tue, 17 Jun 2014 20:24:30 +0000"  >&lt;p&gt;I didn&apos;t compile it, Brian Behlendorf or whomever built it for zfsonlinux epel repository did.&lt;/p&gt;

&lt;p&gt;But yes, this is lustre 2.4.2 and zfs 0.6.3 &lt;/p&gt;

&lt;p&gt;I&apos;ll make graphs shortly&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="86860" author="sknolin" created="Tue, 17 Jun 2014 21:14:41 +0000"  >&lt;p&gt;Attached are requested stats. I broke out the non-constant stats and adjusted a bit to make it a little more interesting.&lt;/p&gt;</comment>
                            <comment id="86865" author="gabriele.paciucci" created="Tue, 17 Jun 2014 21:37:14 +0000"  >
&lt;p&gt;I can&apos;t understand nothing... could you please use MB for the y-axis and take a graph only for:&lt;br/&gt;
-size&lt;br/&gt;
-arc_meta_limit&lt;br/&gt;
-arc_meta_size&lt;/p&gt;

&lt;p&gt;thanks&lt;/p&gt;</comment>
                            <comment id="86885" author="sknolin" created="Wed, 18 Jun 2014 01:36:28 +0000"  >&lt;p&gt;The y-axis is simply the raw data from /proc/spl/kstat/zfs/arcstats - I just put it into a graph quickly, it must be bytes.&lt;/p&gt;

&lt;p&gt;Note that &quot;arc_meta_size&quot; doesn&apos;t exist in arcstats, but there is &quot;meta_size&quot; - I assume the same thing.&lt;/p&gt;

&lt;p&gt;I will make those graphs in MB for you tomorrow, but here&apos;s a description doing some approximate math quickly which should explain it.&lt;/p&gt;

&lt;p&gt;1) arc_meta_limit is a constant, why graph it? Our value was 10000000000 bytes (aka 1E10) = 9536 MB&lt;/p&gt;

&lt;p&gt;2) arc_meta_size - does not exist, assuming we want meta_size - this is in &apos;arcstat-2.png&apos;:&lt;br/&gt;
Ranges from about 6.31E9 to 6.95E9 bytes = 6017 MB to 6628 MB&lt;/p&gt;

&lt;p&gt;3) Size - also in arcstat-2.png&lt;br/&gt;
 9.5E9 to 1.03E10 bytes = 9059 MB to 9822MB&lt;/p&gt;

&lt;p&gt;I ran 2 iterations of the mdtest, and I think you can see the test finish and restart in that graph.&lt;/p&gt;

&lt;p&gt;I captured all the stats, so whatever graphs in whatever format might help, I&apos;ll make tomorrow.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Scott&lt;/p&gt;</comment>
                            <comment id="86894" author="isaac" created="Wed, 18 Jun 2014 05:26:47 +0000"  >&lt;p&gt;Scott,&lt;/p&gt;

&lt;p&gt;In the 1st graph, p (i.e. ARC adaptation parameter) almost never changed, which was weird. Can you verify from your data that p never really changed or its changes were too small to be seen on the graph?&lt;/p&gt;

&lt;p&gt;Also, can you make sure you have this patch &lt;a href=&quot;http://review.whamcloud.com/#/c/10237/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://review.whamcloud.com/#/c/10237/&lt;/a&gt; on the server? It was just landed and in my opinion it makes a lot of sense to apply that patch before trying to tune the ARC.&lt;/p&gt;</comment>
                            <comment id="86912" author="sknolin" created="Wed, 18 Jun 2014 13:31:31 +0000"  >&lt;p&gt;Isaac,&lt;/p&gt;

&lt;p&gt;Regarding &apos;p&apos; in the first graph - you just can&apos;t see it due to the scale. See the &apos;arcstat-3.png&apos; graph which shows &apos;p&apos; on it&apos;s own changing.&lt;/p&gt;

&lt;p&gt;Regarding the patch (I see it&apos;s now also here - &lt;a href=&quot;https://jira.whamcloud.com/browse/LU-5164&quot; title=&quot;Limit lu_object cache (ZFS and osd-zfs)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;LU-5164&quot;&gt;&lt;del&gt;LU-5164&lt;/del&gt;&lt;/a&gt;) - no we don&apos;t have that patch applied. However, we also didn&apos;t have it applied with zfs-0.6.2/lustre-2.4.0 and with the new version stat performance has regressed. ZFS arc performance in general though seems much better.&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="86914" author="sknolin" created="Wed, 18 Jun 2014 13:40:16 +0000"  >&lt;p&gt;arcstat-MB.png shows meta-size, size, and arc-meta-limit in MB.&lt;/p&gt;</comment>
                            <comment id="86915" author="sknolin" created="Wed, 18 Jun 2014 14:19:58 +0000"  >&lt;p&gt;We have upgraded a second filesystem with similar resources on the MDS/MDT, and see pretty much the same performance difference for stat in mdtest.&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="86923" author="gabriele.paciucci" created="Wed, 18 Jun 2014 15:27:39 +0000"  >&lt;p&gt;Do you have 32GB of RAM? if yes could you set these parameters:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;cat /etc/modprobe.d/zfs.conf&lt;br/&gt;
options zfs zfs_arc_max=30687091200&lt;br/&gt;
options zfs zfs_arc_meta_limit=25687091200&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;reboot your system and collect data?&lt;br/&gt;
could you also add the modinfo zfs output?&lt;br/&gt;
thanks&lt;/p&gt;</comment>
                            <comment id="86925" author="sknolin" created="Wed, 18 Jun 2014 15:34:44 +0000"  >&lt;p&gt;We have 256GB of RAM&lt;/p&gt;

&lt;p&gt;modinfo zfs output -&lt;/p&gt;

&lt;p&gt;filename:       /lib/modules/2.6.32-358.6.2.el6.x86_64/extra/zfs.ko&lt;br/&gt;
version:        0.6.3-1&lt;br/&gt;
license:        CDDL&lt;br/&gt;
author:         Sun Microsystems/Oracle, Lawrence Livermore National Laboratory&lt;br/&gt;
description:    ZFS&lt;br/&gt;
srcversion:     C29A443E3D2B93F605A540B&lt;br/&gt;
depends:        spl,znvpair,zcommon,zunicode,zavl&lt;br/&gt;
vermagic:       2.6.32-358.6.2.el6.x86_64 SMP mod_unload modversions&lt;br/&gt;
parm:           zvol_inhibit_dev:Do not create zvol device nodes (uint)&lt;br/&gt;
parm:           zvol_major:Major number for zvol device (uint)&lt;br/&gt;
parm:           zvol_threads:Number of threads for zvol device (uint)&lt;br/&gt;
parm:           zvol_max_discard_blocks:Max number of blocks to discard (ulong)&lt;br/&gt;
parm:           zio_injection_enabled:Enable fault injection (int)&lt;br/&gt;
parm:           zio_bulk_flags:Additional flags to pass to bulk buffers (int)&lt;br/&gt;
parm:           zio_delay_max:Max zio millisec delay before posting event (int)&lt;br/&gt;
parm:           zio_requeue_io_start_cut_in_line:Prioritize requeued I/O (int)&lt;br/&gt;
parm:           zfs_sync_pass_deferred_free:Defer frees starting in this pass (int)&lt;br/&gt;
parm:           zfs_sync_pass_dont_compress:Don&apos;t compress starting in this pass (int)&lt;br/&gt;
parm:           zfs_sync_pass_rewrite:Rewrite new bps starting in this pass (int)&lt;br/&gt;
parm:           zil_replay_disable:Disable intent logging replay (int)&lt;br/&gt;
parm:           zfs_nocacheflush:Disable cache flushes (int)&lt;br/&gt;
parm:           zil_slog_limit:Max commit bytes to separate log device (ulong)&lt;br/&gt;
parm:           zfs_read_chunk_size:Bytes to read per chunk (long)&lt;br/&gt;
parm:           zfs_immediate_write_sz:Largest data block to write to zil (long)&lt;br/&gt;
parm:           zfs_flags:Set additional debugging flags (int)&lt;br/&gt;
parm:           zfs_recover:Set to attempt to recover from fatal errors (int)&lt;br/&gt;
parm:           zfs_expire_snapshot:Seconds to expire .zfs/snapshot (int)&lt;br/&gt;
parm:           zfs_vdev_aggregation_limit:Max vdev I/O aggregation size (int)&lt;br/&gt;
parm:           zfs_vdev_read_gap_limit:Aggregate read I/O over gap (int)&lt;br/&gt;
parm:           zfs_vdev_write_gap_limit:Aggregate write I/O over gap (int)&lt;br/&gt;
parm:           zfs_vdev_max_active:Maximum number of active I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_async_write_active_max_dirty_percent:Async write concurrency max threshold (int)&lt;br/&gt;
parm:           zfs_vdev_async_write_active_min_dirty_percent:Async write concurrency min threshold (int)&lt;br/&gt;
parm:           zfs_vdev_async_read_max_active:Max active async read I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_async_read_min_active:Min active async read I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_async_write_max_active:Max active async write I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_async_write_min_active:Min active async write I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_scrub_max_active:Max active scrub I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_scrub_min_active:Min active scrub I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_sync_read_max_active:Max active sync read I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_sync_read_min_active:Min active sync read I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_sync_write_max_active:Max active sync write I/Os per vdev (int)&lt;br/&gt;
parm:           zfs_vdev_sync_write_min_active:Min active sync write I/Osper vdev (int)&lt;br/&gt;
parm:           zfs_vdev_mirror_switch_us:Switch mirrors every N usecs (int)&lt;br/&gt;
parm:           zfs_vdev_scheduler:I/O scheduler (charp)&lt;br/&gt;
parm:           zfs_vdev_cache_max:Inflate reads small than max (int)&lt;br/&gt;
parm:           zfs_vdev_cache_size:Total size of the per-disk cache (int)&lt;br/&gt;
parm:           zfs_vdev_cache_bshift:Shift size to inflate reads too (int)&lt;br/&gt;
parm:           zfs_txg_timeout:Max seconds worth of delta per txg (int)&lt;br/&gt;
parm:           zfs_read_history:Historic statistics for the last N reads (int)&lt;br/&gt;
parm:           zfs_read_history_hits:Include cache hits in read history (int)&lt;br/&gt;
parm:           zfs_txg_history:Historic statistics for the last N txgs (int)&lt;br/&gt;
parm:           zfs_deadman_synctime_ms:Expiration time in milliseconds (ulong)&lt;br/&gt;
parm:           zfs_deadman_enabled:Enable deadman timer (int)&lt;br/&gt;
parm:           spa_asize_inflation:SPA size estimate multiplication factor (int)&lt;br/&gt;
parm:           spa_config_path:SPA config file (/etc/zfs/zpool.cache) (charp)&lt;br/&gt;
parm:           zfs_autoimport_disable:Disable pool import at module load (int)&lt;br/&gt;
parm:           metaslab_debug_load:load all metaslabs during pool import (int)&lt;br/&gt;
parm:           metaslab_debug_unload:prevent metaslabs from being unloaded (int)&lt;br/&gt;
parm:           zfs_zevent_len_max:Max event queue length (int)&lt;br/&gt;
parm:           zfs_zevent_cols:Max event column width (int)&lt;br/&gt;
parm:           zfs_zevent_console:Log events to the console (int)&lt;br/&gt;
parm:           zfs_top_maxinflight:Max I/Os per top-level (int)&lt;br/&gt;
parm:           zfs_resilver_delay:Number of ticks to delay resilver (int)&lt;br/&gt;
parm:           zfs_scrub_delay:Number of ticks to delay scrub (int)&lt;br/&gt;
parm:           zfs_scan_idle:Idle window in clock ticks (int)&lt;br/&gt;
parm:           zfs_scan_min_time_ms:Min millisecs to scrub per txg (int)&lt;br/&gt;
parm:           zfs_free_min_time_ms:Min millisecs to free per txg (int)&lt;br/&gt;
parm:           zfs_resilver_min_time_ms:Min millisecs to resilver per txg (int)&lt;br/&gt;
parm:           zfs_no_scrub_io:Set to disable scrub I/O (int)&lt;br/&gt;
parm:           zfs_no_scrub_prefetch:Set to disable scrub prefetching (int)&lt;br/&gt;
parm:           zfs_dirty_data_max_percent:percent of ram can be dirty (int)&lt;br/&gt;
parm:           zfs_dirty_data_max_max_percent:zfs_dirty_data_max upper bound as % of RAM (int)&lt;br/&gt;
parm:           zfs_delay_min_dirty_percent:transaction delay threshold (int)&lt;br/&gt;
parm:           zfs_dirty_data_max:determines the dirty space limit (ulong)&lt;br/&gt;
parm:           zfs_dirty_data_max_max:zfs_dirty_data_max upper bound in bytes (ulong)&lt;br/&gt;
parm:           zfs_dirty_data_sync:sync txg when this much dirty data (ulong)&lt;br/&gt;
parm:           zfs_delay_scale:how quickly delay approaches infinity (ulong)&lt;br/&gt;
parm:           zfs_prefetch_disable:Disable all ZFS prefetching (int)&lt;br/&gt;
parm:           zfetch_max_streams:Max number of streams per zfetch (uint)&lt;br/&gt;
parm:           zfetch_min_sec_reap:Min time before stream reclaim (uint)&lt;br/&gt;
parm:           zfetch_block_cap:Max number of blocks to fetch at a time (uint)&lt;br/&gt;
parm:           zfetch_array_rd_sz:Number of bytes in a array_read (ulong)&lt;br/&gt;
parm:           zfs_pd_blks_max:Max number of blocks to prefetch (int)&lt;br/&gt;
parm:           zfs_send_corrupt_data:Allow sending corrupt data (int)&lt;br/&gt;
parm:           zfs_mdcomp_disable:Disable meta data compression (int)&lt;br/&gt;
parm:           zfs_nopwrite_enabled:Enable NOP writes (int)&lt;br/&gt;
parm:           zfs_dedup_prefetch:Enable prefetching dedup-ed blks (int)&lt;br/&gt;
parm:           zfs_dbuf_state_index:Calculate arc header index (int)&lt;br/&gt;
parm:           zfs_arc_min:Min arc size (ulong)&lt;br/&gt;
parm:           zfs_arc_max:Max arc size (ulong)&lt;br/&gt;
parm:           zfs_arc_meta_limit:Meta limit for arc size (ulong)&lt;br/&gt;
parm:           zfs_arc_meta_prune:Bytes of meta data to prune (int)&lt;br/&gt;
parm:           zfs_arc_grow_retry:Seconds before growing arc size (int)&lt;br/&gt;
parm:           zfs_arc_p_aggressive_disable:disable aggressive arc_p grow (int)&lt;br/&gt;
parm:           zfs_arc_p_dampener_disable:disable arc_p adapt dampener (int)&lt;br/&gt;
parm:           zfs_arc_shrink_shift:log2(fraction of arc to reclaim) (int)&lt;br/&gt;
parm:           zfs_disable_dup_eviction:disable duplicate buffer eviction (int)&lt;br/&gt;
parm:           zfs_arc_memory_throttle_disable:disable memory throttle (int)&lt;br/&gt;
parm:           zfs_arc_min_prefetch_lifespan:Min life of prefetch block (int)&lt;br/&gt;
parm:           l2arc_write_max:Max write bytes per interval (ulong)&lt;br/&gt;
parm:           l2arc_write_boost:Extra write bytes during device warmup (ulong)&lt;br/&gt;
parm:           l2arc_headroom:Number of max device writes to precache (ulong)&lt;br/&gt;
parm:           l2arc_headroom_boost:Compressed l2arc_headroom multiplier (ulong)&lt;br/&gt;
parm:           l2arc_feed_secs:Seconds between L2ARC writing (ulong)&lt;br/&gt;
parm:           l2arc_feed_min_ms:Min feed interval in milliseconds (ulong)&lt;br/&gt;
parm:           l2arc_noprefetch:Skip caching prefetched buffers (int)&lt;br/&gt;
parm:           l2arc_nocompress:Skip compressing L2ARC buffers (int)&lt;br/&gt;
parm:           l2arc_feed_again:Turbo L2ARC warmup (int)&lt;br/&gt;
parm:           l2arc_norw:No reads during writes (int)&lt;/p&gt;</comment>
                            <comment id="86928" author="aawagner" created="Wed, 18 Jun 2014 15:46:33 +0000"  >&lt;p&gt;Gabriele,&lt;/p&gt;

&lt;p&gt;I&apos;m working on this filesystem with Scott:&lt;/p&gt;

&lt;p&gt;Speaking to the Arc Cache settings, we are currently using:&lt;/p&gt;

&lt;p&gt;options zfs zfs_arc_meta_limit=10000000000&lt;br/&gt;
options zfs zfs_arc_max=150000000000&lt;/p&gt;

&lt;p&gt;However, we&apos;re nowhere near to filling that up so we&apos;re not seeing any excessive cache pressure right now.&lt;/p&gt;</comment>
                            <comment id="86929" author="gabriele.paciucci" created="Wed, 18 Jun 2014 15:48:56 +0000"  >&lt;p&gt;Have you set these values?&lt;br/&gt;
by default arc_meta_limt should be 1/4 of the total RAM, so in your case 64GB and the zfs_arc_max 1/2.&lt;/p&gt;

&lt;p&gt;In the arcstat-MB seems to be 10GB only.&lt;/p&gt;

&lt;p&gt;could you increase these value? also the zfs_arc_max?&lt;/p&gt;</comment>
                            <comment id="86931" author="aawagner" created="Wed, 18 Jun 2014 15:54:41 +0000"  >&lt;p&gt;Yes, we set these values after observations of different values on ZFS 0.6.2. The larger meta_limit helped us avoid running into ugly cache issues.&lt;/p&gt;

&lt;p&gt;Either way, the cache is only using about 10GB right now as the filesystem has only been up for two days and is relatively quiet. We can&apos;t force it to use more of the cache without more activity.&lt;/p&gt;</comment>
                            <comment id="86932" author="gabriele.paciucci" created="Wed, 18 Jun 2014 15:59:58 +0000"  >&lt;p&gt;Okay but yesterday Scott captured these values:&lt;br/&gt;
arc_meta_limit 10000000000&lt;br/&gt;
arc_meta_max 10000274368&lt;/p&gt;

&lt;p&gt;and for me you go out of the arc memory for metadata during your mdtest.&lt;/p&gt;
</comment>
                            <comment id="86935" author="aawagner" created="Wed, 18 Jun 2014 16:01:27 +0000"  >&lt;p&gt;Sorry about that, we were missing a 0 on the arc_meta_limit. We&apos;ll retest with the new values.&lt;/p&gt;</comment>
                            <comment id="86962" author="sknolin" created="Wed, 18 Jun 2014 19:07:24 +0000"  >&lt;p&gt;I&apos;ve completed an initial run with the zfs_arc_meta_limit and max set more approprately to 100G and 150G.&lt;/p&gt;

&lt;p&gt;The mdtest data doesn&apos;t look any better, it&apos;s actually a bit worse. Complicating things, the filesystem is now in use by other jobs, not heavy use but a few hundred iops on various tasks it looks like (just watching jobstats in general). I&apos;ll post actual data if I can get another test done.&lt;/p&gt;

&lt;p&gt;Regardless of that one, our &lt;b&gt;other&lt;/b&gt; filesystem has more appropriate numbers to start with (it was left at default), and see a very similar difference in stat performance. So I don&apos;t really expect much...&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="86971" author="sknolin" created="Wed, 18 Jun 2014 19:56:31 +0000"  >&lt;p&gt;Here are results with more appropriate limits, included the graph with arc_meta_limit to show we&apos;re not exceeding it, and also a version with better scale.&lt;/p&gt;

&lt;p&gt;Notice how things certainly aren&apos;t better, they&apos;re a little worse.&lt;/p&gt;

&lt;p&gt;These graphs all have &quot;95G&quot; in the title. I wish I could embed images within comments to make it flow better.&lt;/p&gt;</comment>
                            <comment id="88477" author="prakash" created="Tue, 8 Jul 2014 18:11:54 +0000"  >&lt;p&gt;I don&apos;t mean to barge in so late to the party, but I&apos;m curious what the status of this issue is? It&apos;s a little hard to follow the comments and the attached graphs. What&apos;s the observed performance degradation, and what configuration changes/experiments have been tried (and what were the results)?&lt;/p&gt;</comment>
                            <comment id="88517" author="sknolin" created="Tue, 8 Jul 2014 19:50:20 +0000"  >&lt;p&gt;The observed performance degradation is ~45% lower stat IOPs in mdtest (8000 vs 14400) after the zfs/lustre upgrade. &lt;/p&gt;

&lt;p&gt;We adjusted the arc_meta_limit (as it was set poorly) but it made no difference. The easiest graph to look at is this one: &lt;a href=&quot;https://jira.hpdd.intel.com/secure/attachment/15192/mdtest-zfs063-95G-arc_meta_limit.png&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://jira.hpdd.intel.com/secure/attachment/15192/mdtest-zfs063-95G-arc_meta_limit.png&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The graphs are all for one particular filesystem, but we have a second file system with similar hardware and the same software versions and saw similar degradation in stat performance for mdtest.&lt;/p&gt;

&lt;p&gt;If anyone runs mdtest just prior to upgrading lustre/zfs I&apos;d be interested in their results, I suspect it will be similar. Our software is  from the zfsonlinux repo with no additional patches.&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="88597" author="prakash" created="Wed, 9 Jul 2014 15:04:27 +0000"  >&lt;p&gt;Thanks Scott. That definitely doesn&apos;t sit well with me.&lt;/p&gt;

&lt;p&gt;Can you post the command you used as a test? Do you have the exact mdtest command/options you used? How many nodes? If I can get some time, I might try and reproduce this and see if I can better understand what&apos;s going on here. It&apos;s definitely &lt;b&gt;not&lt;/b&gt; expected nor desired for the performance to drop like that; I want to get to the bottom of this.&lt;/p&gt;

&lt;p&gt;Also, what&apos;s the Y axis label in the graph you linked to? I saw that earlier, but I can&apos;t make sense of it without labels. My initial interpretation was the Y axis is seconds, but that would mean lower is better, which doesn&apos;t agree with the claim of a performance decrease.&lt;/p&gt;</comment>
                            <comment id="88599" author="prakash" created="Wed, 9 Jul 2014 15:07:44 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Also, what&apos;s the Y axis label in the graph you linked to? I saw that earlier, but I can&apos;t make sense of it without labels. My initial interpretation was the Y axis is seconds, but that would mean lower is better, which doesn&apos;t agree with the claim of a performance decrease.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Actually, I think I got it now. The Y axis must be the rate of operations per second, which lines up with your claim of 14400 stat/s prior and 8000 stat/s now.&lt;/p&gt;

&lt;p&gt;When you get a chance, please update us with the command used to generate the workload.&lt;/p&gt;</comment>
                            <comment id="88602" author="sknolin" created="Wed, 9 Jul 2014 15:21:06 +0000"  >&lt;p&gt;Y-axis is IOPs.&lt;/p&gt;

&lt;p&gt;The command info:&lt;/p&gt;

&lt;p&gt;mdtest-1.9.1 was launched with 64 total task(s) on 4 node(s)&lt;br/&gt;
Command line used: /home/scottn/benchmarks/mdtest -i 2 -F -n 4000 -d /arcdata/scottn/mdtest&lt;/p&gt;

&lt;p&gt;64 tasks, 256000 files&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="88604" author="sknolin" created="Wed, 9 Jul 2014 15:23:28 +0000"  >&lt;p&gt;I would also add, that while this absolute number from mdtest is worse, in use so far the upgrade has been an improvement. Performance doesn&apos;t seem to degrade so quickly with file creates, and things like interactive &apos;ls -l&apos; are much better.&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="88628" author="prakash" created="Wed, 9 Jul 2014 18:39:00 +0000"  >&lt;blockquote&gt;
&lt;p&gt;I would also add, that while this absolute number from mdtest is worse, in use so far the upgrade has been an improvement. Performance doesn&apos;t seem to degrade so quickly with file creates, and things like interactive &apos;ls -l&apos; are much better.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Glad to hear it! &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.whamcloud.com/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;I&apos;m still a bit puzzled regarding the stat&apos;s though. I&apos;m going to try and reproduce this using our test cluster; stay tuned.&lt;/p&gt;</comment>
                            <comment id="88663" author="prakash" created="Wed, 9 Jul 2014 22:33:07 +0000"  >&lt;p&gt;Interesting.. I think I see similar reduced performance with stats as well.. Hm..&lt;/p&gt;

&lt;p&gt;So here&apos;s the mdtest output with releases based on lustre 2.4.2 and zfs 0.6.3 on the servers:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;hype355@root:srun -- mdtest -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-1
-- started at 07/09/2014 13:09:02 --

mdtest-1.8.3 was launched with 64 total task(s) on 64 nodes
Command line used: /opt/mdtest-1.8.3/bin/mdtest -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-1
Path: /p/lcratery/surya1/LU-5212
FS: 1019.6 TiB   Used FS: 50.3%   Inodes: 866.3 Mi   Used Inodes: 60.2%

64 tasks, 256000 files

SUMMARY: (of 8 iterations)
   Operation                  Max        Min       Mean    Std Dev
   ---------                  ---        ---       ----    -------
   File creation     :   2046.438    838.703   1534.565    375.574
   File stat         :  65205.403  23577.494  57837.499  13089.055
   File removal      :   4780.471   4647.670   4719.076     45.088
   Tree creation     :    505.051     34.332    221.404    196.950
   Tree removal      :     12.423     10.049     11.123      0.763

-- finished at 07/09/2014 13:40:52 --
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;And here&apos;s the mdtest output with releases based on lustre 2.4.0 and zfs 0.6.2 on the servers:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;hype355@root:srun -- mdtest -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-1
-- started at 07/09/2014 14:43:06 --

mdtest-1.8.3 was launched with 64 total task(s) on 64 nodes
Command line used: /opt/mdtest-1.8.3/bin/mdtest -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-1
Path: /p/lcratery/surya1/LU-5212
FS: 1019.6 TiB   Used FS: 50.3%   Inodes: 861.8 Mi   Used Inodes: 60.5%

64 tasks, 256000 files

SUMMARY: (of 8 iterations)
   Operation                  Max        Min       Mean    Std Dev
   ---------                  ---        ---       ----    -------
   File creation     :   1627.029    810.017   1320.848    239.655
   File stat         :  99560.417  69839.184  88798.194   9632.641
   File removal      :   4352.713   3279.728   4029.607    413.213
   Tree creation     :    348.675     33.174    194.944    141.913
   Tree removal      :     15.176     10.103     12.088      1.386

-- finished at 07/09/2014 15:19:02 --
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Which shows about a 34% decrease in the mean &quot;File stat&quot; performance with the lustre 2.4.2 and zfs 0.6.3 release (I&apos;m assuming the number reported is operations per second). That&apos;s no good.&lt;/p&gt;</comment>
                            <comment id="89087" author="prakash" created="Tue, 15 Jul 2014 18:30:04 +0000"  >&lt;p&gt;Scott, can you try increasing the `lu_cache_nr` module option and re-running the test?&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;# zwicky-lcy-mds1 /root &amp;gt; cat /sys/module/obdclass/parameters/lu_cache_nr
256
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;
&lt;p&gt;Try increasing it to something much larger, maybe 1M. I&apos;d try that myself, but our testing resource is busy with other work at the moment.&lt;/p&gt;</comment>
                            <comment id="89177" author="sknolin" created="Wed, 16 Jul 2014 01:35:00 +0000"  >&lt;p&gt;Prakash, we will give this a try soon.&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="89243" author="prakash" created="Wed, 16 Jul 2014 17:04:57 +0000"  >&lt;p&gt;Scott, I was able to squeeze in a test run with &lt;tt&gt;lu_cache_nr=1048576&lt;/tt&gt; on the MDS and all OSS nodes in the filesystem. I didn&apos;t see any significant difference:&lt;/p&gt;
&lt;div class=&quot;preformatted panel&quot; style=&quot;border-width: 1px;&quot;&gt;&lt;div class=&quot;preformattedContent panelContent&quot;&gt;
&lt;pre&gt;hype355@root:srun -- mdtest -v -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-5                                                                          
-- started at 07/16/2014 09:23:46 --

mdtest-1.8.3 was launched with 64 total task(s) on 64 nodes
Command line used: /opt/mdtest-1.8.3/bin/mdtest -v -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-5
Path: /p/lcratery/surya1/LU-5212
FS: 1019.6 TiB   Used FS: 50.3%   Inodes: 834.8 Mi   Used Inodes: 62.5%

64 tasks, 256000 files

SUMMARY: (of 8 iterations)
   Operation                  Max        Min       Mean    Std Dev
   ---------                  ---        ---       ----    -------
   File creation     :   3060.802   2525.669   2719.003    161.410
   File stat         :  72310.501  32382.555  57016.755  11440.553
   File removal      :   4344.489   4043.991   4224.141     97.727
   Tree creation     :    377.644     32.784    147.864    126.958
   Tree removal      :     11.800      9.356     10.626      0.884

-- finished at 07/16/2014 09:45:06 --
&lt;/pre&gt;
&lt;/div&gt;&lt;/div&gt;</comment>
                            <comment id="89245" author="sknolin" created="Wed, 16 Jul 2014 17:25:48 +0000"  >&lt;p&gt;Prakash, thanks for letting me know. We won&apos;t bother running it then. Ours is a production cluster, we can typically run these tests though as it&apos;s not heavily used all the time, but it&apos;s not easy.&lt;/p&gt;

&lt;p&gt;Scott&lt;/p&gt;</comment>
                            <comment id="225204" author="pjones" created="Thu, 5 Apr 2018 14:03:56 +0000"  >&lt;p&gt;I imagine the performance is quite different on more current versions of Lustre and ZFS&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Related</name>
                                            <outwardlinks description="is related to ">
                                        <issuelink>
            <issuekey id="25168">LU-5203</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="15183" name="arcstat-1.png" size="11579" author="sknolin" created="Tue, 17 Jun 2014 21:14:41 +0000"/>
                            <attachment id="15190" name="arcstat-2-95G-arc_meta_limit.png" size="13130" author="sknolin" created="Wed, 18 Jun 2014 19:56:31 +0000"/>
                            <attachment id="15182" name="arcstat-2.png" size="12735" author="sknolin" created="Tue, 17 Jun 2014 21:14:41 +0000"/>
                            <attachment id="15181" name="arcstat-3.png" size="14560" author="sknolin" created="Tue, 17 Jun 2014 21:14:41 +0000"/>
                            <attachment id="15191" name="arcstat-95G-arc_meta_limit.png" size="10427" author="sknolin" created="Wed, 18 Jun 2014 19:56:31 +0000"/>
                            <attachment id="15187" name="arcstat-MB.png" size="11562" author="sknolin" created="Wed, 18 Jun 2014 13:40:16 +0000"/>
                            <attachment id="15192" name="mdtest-zfs063-95G-arc_meta_limit.png" size="17720" author="sknolin" created="Wed, 18 Jun 2014 19:56:31 +0000"/>
                            <attachment id="15176" name="mdtest-zfs063.png" size="15837" author="sknolin" created="Tue, 17 Jun 2014 15:49:26 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                            <customfield id="customfield_10890" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10030" key="com.atlassian.jira.plugin.system.customfieldtypes:labels">
                        <customfieldname>Epic/Theme</customfieldname>
                        <customfieldvalues>
                                        <label>Performance</label>
            <label>zfs</label>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10390" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hzwp8f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10090" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>14543</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10060" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Severity</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10022"><![CDATA[3]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                        </customfields>
    </item>
</channel>
</rss>