[LU-5212] poor stat performance after upgrade from zfs-0.6.2-1/lustre-2.4.0-1 to zfs-0.6.3-1 Created: 17/Jun/14  Updated: 05/Apr/18  Resolved: 05/Apr/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.2
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Scott Nolin Assignee: WC Triage
Resolution: Won't Fix Votes: 0
Labels: llnl, prz, zfs
Environment:

centos 6.5


Attachments: PNG File arcstat-1.png     PNG File arcstat-2-95G-arc_meta_limit.png     PNG File arcstat-2.png     PNG File arcstat-3.png     PNG File arcstat-95G-arc_meta_limit.png     PNG File arcstat-MB.png     PNG File mdtest-zfs063-95G-arc_meta_limit.png     PNG File mdtest-zfs063.png    
Issue Links:
Related
is related to LU-5203 Update ZFS Version to 0.6.3 Resolved
Epic/Theme: Performance, zfs
Severity: 3
Rank (Obsolete): 14543

 Description   

After upgrading a system from
lustre 2.4.0-1 / zfs-0.6.2-1 to
lustre 2.4.2-1 / zfs-0.6.3-1

mdtest shows a signficantly lower stat performance - about 8000 iops vs 14400

File reads and file removals are a bit worse, but not as severe. See the attached graph.

We do see other marked improvements with the upgrade, for example with system processes waiting on the MDS.

I wonder if this is some kind of expected perfromance tradeoff for the new version? I'm guessing the absolute numbers for stat are still acceptable for our workload, but it is quite a large relative difference.

Scott



 Comments   
Comment by Gabriele Paciucci (Inactive) [ 17/Jun/14 ]

Hi Scott,
Can you confirm lustre 2.4.2 and zfs 0.6.3?

Have you seen this patch https://jira.hpdd.intel.com/browse/LU-4944 ?

Comment by Gabriele Paciucci (Inactive) [ 17/Jun/14 ]

Can you collect please the /proc/spl/kstat/zfs/arcstats during the benchmark and upload a graph of:

  • arc_meta_limit
  • arc_meta_size
  • c_max
  • c_min
  • p
  • size

thanks

Comment by Scott Nolin [ 17/Jun/14 ]

This is lustre 2.4.2 and zfs 0.6.3 from the ZFS on Linux EPEL repository

I'm not certain I am following this correctly, but the comments on LU-4944 suggest this patch is already in our zfs version

I'm not sure when I can run the benchmark again to collect stats while running. Here's limit values at least

arc_meta_limit 10000000000
arc_meta_max 10000274368
c_min 4194304
c_max 150000000000

Comment by Scott Nolin [ 17/Jun/14 ]

When I do get to run this it will be easier if I can just use arcstat.py to collect data. If I do that, which of these fields do you want to see on a graph?

Field definitions are as follows:
l2bytes : bytes read per second from the L2ARC
l2hits : L2ARC hits per second
read : Total ARC accesses per second
dmis : Demand Data misses per second
mru : MRU List hits per second
dread : Demand data accesses per second
mread : Metadata accesses per second
c : ARC Target Size
ph% : Prefetch hits percentage
l2hit% : L2ARC access hit percentage
pm% : Prefetch miss percentage
mfu : MFU List hits per second
mm% : Metadata miss percentage
pread : Prefetch accesses per second
miss : ARC misses per second
mrug : MRU Ghost List hits per second
dhit : Demand Data hits per second
mfug : MFU Ghost List hits per second
hits : ARC reads per second
dm% : Demand Data miss percentage
miss% : ARC miss percentage
mhit : Metadata hits per second
dh% : Demand Data hit percentage
mh% : Metadata hit percentage
pmis : Prefetch misses per second
l2asize : Actual (compressed) size of the L2ARC
l2miss% : L2ARC access miss percentage
l2miss : L2ARC misses per second
mmis : Metadata misses per second
phit : Prefetch hits per second
hit% : ARC Hit percentage
eskip : evict_skip per second
arcsz : ARC Size
time : Time
l2read : Total L2ARC accesses per second
l2size : Size of the L2ARC
mtxmis : mutex_miss per second
rmis : recycle_miss per second

Comment by Gabriele Paciucci (Inactive) [ 17/Jun/14 ]

Hi Scott,
the interesting thing is to collect arcstats every for example 5 seconds and create a graph of the values using for example gnuplot during the test. arcstats.py is more to understand the cache hit/miss.
In my experience your problem is more on optimizing the arc_meta_limit size, but to validate this, I need to see the memory consumption during the run.

Comment by Gabriele Paciucci (Inactive) [ 17/Jun/14 ]

Have you compiled lustre 2.4.2 against ZFS 0.6.3?

Comment by Scott Nolin [ 17/Jun/14 ]

I didn't compile it, Brian Behlendorf or whomever built it for zfsonlinux epel repository did.

But yes, this is lustre 2.4.2 and zfs 0.6.3

I'll make graphs shortly

Scott

Comment by Scott Nolin [ 17/Jun/14 ]

Attached are requested stats. I broke out the non-constant stats and adjusted a bit to make it a little more interesting.

Comment by Gabriele Paciucci (Inactive) [ 17/Jun/14 ]

I can't understand nothing... could you please use MB for the y-axis and take a graph only for:
-size
-arc_meta_limit
-arc_meta_size

thanks

Comment by Scott Nolin [ 18/Jun/14 ]

The y-axis is simply the raw data from /proc/spl/kstat/zfs/arcstats - I just put it into a graph quickly, it must be bytes.

Note that "arc_meta_size" doesn't exist in arcstats, but there is "meta_size" - I assume the same thing.

I will make those graphs in MB for you tomorrow, but here's a description doing some approximate math quickly which should explain it.

1) arc_meta_limit is a constant, why graph it? Our value was 10000000000 bytes (aka 1E10) = 9536 MB

2) arc_meta_size - does not exist, assuming we want meta_size - this is in 'arcstat-2.png':
Ranges from about 6.31E9 to 6.95E9 bytes = 6017 MB to 6628 MB

3) Size - also in arcstat-2.png
9.5E9 to 1.03E10 bytes = 9059 MB to 9822MB

I ran 2 iterations of the mdtest, and I think you can see the test finish and restart in that graph.

I captured all the stats, so whatever graphs in whatever format might help, I'll make tomorrow.

Thanks,
Scott

Comment by Isaac Huang (Inactive) [ 18/Jun/14 ]

Scott,

In the 1st graph, p (i.e. ARC adaptation parameter) almost never changed, which was weird. Can you verify from your data that p never really changed or its changes were too small to be seen on the graph?

Also, can you make sure you have this patch http://review.whamcloud.com/#/c/10237/ on the server? It was just landed and in my opinion it makes a lot of sense to apply that patch before trying to tune the ARC.

Comment by Scott Nolin [ 18/Jun/14 ]

Isaac,

Regarding 'p' in the first graph - you just can't see it due to the scale. See the 'arcstat-3.png' graph which shows 'p' on it's own changing.

Regarding the patch (I see it's now also here - LU-5164) - no we don't have that patch applied. However, we also didn't have it applied with zfs-0.6.2/lustre-2.4.0 and with the new version stat performance has regressed. ZFS arc performance in general though seems much better.

Scott

Comment by Scott Nolin [ 18/Jun/14 ]

arcstat-MB.png shows meta-size, size, and arc-meta-limit in MB.

Comment by Scott Nolin [ 18/Jun/14 ]

We have upgraded a second filesystem with similar resources on the MDS/MDT, and see pretty much the same performance difference for stat in mdtest.

Scott

Comment by Gabriele Paciucci (Inactive) [ 18/Jun/14 ]

Do you have 32GB of RAM? if yes could you set these parameters:

  1. cat /etc/modprobe.d/zfs.conf
    options zfs zfs_arc_max=30687091200
    options zfs zfs_arc_meta_limit=25687091200

reboot your system and collect data?
could you also add the modinfo zfs output?
thanks

Comment by Scott Nolin [ 18/Jun/14 ]

We have 256GB of RAM

modinfo zfs output -

filename: /lib/modules/2.6.32-358.6.2.el6.x86_64/extra/zfs.ko
version: 0.6.3-1
license: CDDL
author: Sun Microsystems/Oracle, Lawrence Livermore National Laboratory
description: ZFS
srcversion: C29A443E3D2B93F605A540B
depends: spl,znvpair,zcommon,zunicode,zavl
vermagic: 2.6.32-358.6.2.el6.x86_64 SMP mod_unload modversions
parm: zvol_inhibit_dev:Do not create zvol device nodes (uint)
parm: zvol_major:Major number for zvol device (uint)
parm: zvol_threads:Number of threads for zvol device (uint)
parm: zvol_max_discard_blocks:Max number of blocks to discard (ulong)
parm: zio_injection_enabled:Enable fault injection (int)
parm: zio_bulk_flags:Additional flags to pass to bulk buffers (int)
parm: zio_delay_max:Max zio millisec delay before posting event (int)
parm: zio_requeue_io_start_cut_in_line:Prioritize requeued I/O (int)
parm: zfs_sync_pass_deferred_free:Defer frees starting in this pass (int)
parm: zfs_sync_pass_dont_compress:Don't compress starting in this pass (int)
parm: zfs_sync_pass_rewrite:Rewrite new bps starting in this pass (int)
parm: zil_replay_disable:Disable intent logging replay (int)
parm: zfs_nocacheflush:Disable cache flushes (int)
parm: zil_slog_limit:Max commit bytes to separate log device (ulong)
parm: zfs_read_chunk_size:Bytes to read per chunk (long)
parm: zfs_immediate_write_sz:Largest data block to write to zil (long)
parm: zfs_flags:Set additional debugging flags (int)
parm: zfs_recover:Set to attempt to recover from fatal errors (int)
parm: zfs_expire_snapshot:Seconds to expire .zfs/snapshot (int)
parm: zfs_vdev_aggregation_limit:Max vdev I/O aggregation size (int)
parm: zfs_vdev_read_gap_limit:Aggregate read I/O over gap (int)
parm: zfs_vdev_write_gap_limit:Aggregate write I/O over gap (int)
parm: zfs_vdev_max_active:Maximum number of active I/Os per vdev (int)
parm: zfs_vdev_async_write_active_max_dirty_percent:Async write concurrency max threshold (int)
parm: zfs_vdev_async_write_active_min_dirty_percent:Async write concurrency min threshold (int)
parm: zfs_vdev_async_read_max_active:Max active async read I/Os per vdev (int)
parm: zfs_vdev_async_read_min_active:Min active async read I/Os per vdev (int)
parm: zfs_vdev_async_write_max_active:Max active async write I/Os per vdev (int)
parm: zfs_vdev_async_write_min_active:Min active async write I/Os per vdev (int)
parm: zfs_vdev_scrub_max_active:Max active scrub I/Os per vdev (int)
parm: zfs_vdev_scrub_min_active:Min active scrub I/Os per vdev (int)
parm: zfs_vdev_sync_read_max_active:Max active sync read I/Os per vdev (int)
parm: zfs_vdev_sync_read_min_active:Min active sync read I/Os per vdev (int)
parm: zfs_vdev_sync_write_max_active:Max active sync write I/Os per vdev (int)
parm: zfs_vdev_sync_write_min_active:Min active sync write I/Osper vdev (int)
parm: zfs_vdev_mirror_switch_us:Switch mirrors every N usecs (int)
parm: zfs_vdev_scheduler:I/O scheduler (charp)
parm: zfs_vdev_cache_max:Inflate reads small than max (int)
parm: zfs_vdev_cache_size:Total size of the per-disk cache (int)
parm: zfs_vdev_cache_bshift:Shift size to inflate reads too (int)
parm: zfs_txg_timeout:Max seconds worth of delta per txg (int)
parm: zfs_read_history:Historic statistics for the last N reads (int)
parm: zfs_read_history_hits:Include cache hits in read history (int)
parm: zfs_txg_history:Historic statistics for the last N txgs (int)
parm: zfs_deadman_synctime_ms:Expiration time in milliseconds (ulong)
parm: zfs_deadman_enabled:Enable deadman timer (int)
parm: spa_asize_inflation:SPA size estimate multiplication factor (int)
parm: spa_config_path:SPA config file (/etc/zfs/zpool.cache) (charp)
parm: zfs_autoimport_disable:Disable pool import at module load (int)
parm: metaslab_debug_load:load all metaslabs during pool import (int)
parm: metaslab_debug_unload:prevent metaslabs from being unloaded (int)
parm: zfs_zevent_len_max:Max event queue length (int)
parm: zfs_zevent_cols:Max event column width (int)
parm: zfs_zevent_console:Log events to the console (int)
parm: zfs_top_maxinflight:Max I/Os per top-level (int)
parm: zfs_resilver_delay:Number of ticks to delay resilver (int)
parm: zfs_scrub_delay:Number of ticks to delay scrub (int)
parm: zfs_scan_idle:Idle window in clock ticks (int)
parm: zfs_scan_min_time_ms:Min millisecs to scrub per txg (int)
parm: zfs_free_min_time_ms:Min millisecs to free per txg (int)
parm: zfs_resilver_min_time_ms:Min millisecs to resilver per txg (int)
parm: zfs_no_scrub_io:Set to disable scrub I/O (int)
parm: zfs_no_scrub_prefetch:Set to disable scrub prefetching (int)
parm: zfs_dirty_data_max_percent:percent of ram can be dirty (int)
parm: zfs_dirty_data_max_max_percent:zfs_dirty_data_max upper bound as % of RAM (int)
parm: zfs_delay_min_dirty_percent:transaction delay threshold (int)
parm: zfs_dirty_data_max:determines the dirty space limit (ulong)
parm: zfs_dirty_data_max_max:zfs_dirty_data_max upper bound in bytes (ulong)
parm: zfs_dirty_data_sync:sync txg when this much dirty data (ulong)
parm: zfs_delay_scale:how quickly delay approaches infinity (ulong)
parm: zfs_prefetch_disable:Disable all ZFS prefetching (int)
parm: zfetch_max_streams:Max number of streams per zfetch (uint)
parm: zfetch_min_sec_reap:Min time before stream reclaim (uint)
parm: zfetch_block_cap:Max number of blocks to fetch at a time (uint)
parm: zfetch_array_rd_sz:Number of bytes in a array_read (ulong)
parm: zfs_pd_blks_max:Max number of blocks to prefetch (int)
parm: zfs_send_corrupt_data:Allow sending corrupt data (int)
parm: zfs_mdcomp_disable:Disable meta data compression (int)
parm: zfs_nopwrite_enabled:Enable NOP writes (int)
parm: zfs_dedup_prefetch:Enable prefetching dedup-ed blks (int)
parm: zfs_dbuf_state_index:Calculate arc header index (int)
parm: zfs_arc_min:Min arc size (ulong)
parm: zfs_arc_max:Max arc size (ulong)
parm: zfs_arc_meta_limit:Meta limit for arc size (ulong)
parm: zfs_arc_meta_prune:Bytes of meta data to prune (int)
parm: zfs_arc_grow_retry:Seconds before growing arc size (int)
parm: zfs_arc_p_aggressive_disable:disable aggressive arc_p grow (int)
parm: zfs_arc_p_dampener_disable:disable arc_p adapt dampener (int)
parm: zfs_arc_shrink_shift:log2(fraction of arc to reclaim) (int)
parm: zfs_disable_dup_eviction:disable duplicate buffer eviction (int)
parm: zfs_arc_memory_throttle_disable:disable memory throttle (int)
parm: zfs_arc_min_prefetch_lifespan:Min life of prefetch block (int)
parm: l2arc_write_max:Max write bytes per interval (ulong)
parm: l2arc_write_boost:Extra write bytes during device warmup (ulong)
parm: l2arc_headroom:Number of max device writes to precache (ulong)
parm: l2arc_headroom_boost:Compressed l2arc_headroom multiplier (ulong)
parm: l2arc_feed_secs:Seconds between L2ARC writing (ulong)
parm: l2arc_feed_min_ms:Min feed interval in milliseconds (ulong)
parm: l2arc_noprefetch:Skip caching prefetched buffers (int)
parm: l2arc_nocompress:Skip compressing L2ARC buffers (int)
parm: l2arc_feed_again:Turbo L2ARC warmup (int)
parm: l2arc_norw:No reads during writes (int)

Comment by Andrew Wagner [ 18/Jun/14 ]

Gabriele,

I'm working on this filesystem with Scott:

Speaking to the Arc Cache settings, we are currently using:

options zfs zfs_arc_meta_limit=10000000000
options zfs zfs_arc_max=150000000000

However, we're nowhere near to filling that up so we're not seeing any excessive cache pressure right now.

Comment by Gabriele Paciucci (Inactive) [ 18/Jun/14 ]

Have you set these values?
by default arc_meta_limt should be 1/4 of the total RAM, so in your case 64GB and the zfs_arc_max 1/2.

In the arcstat-MB seems to be 10GB only.

could you increase these value? also the zfs_arc_max?

Comment by Andrew Wagner [ 18/Jun/14 ]

Yes, we set these values after observations of different values on ZFS 0.6.2. The larger meta_limit helped us avoid running into ugly cache issues.

Either way, the cache is only using about 10GB right now as the filesystem has only been up for two days and is relatively quiet. We can't force it to use more of the cache without more activity.

Comment by Gabriele Paciucci (Inactive) [ 18/Jun/14 ]

Okay but yesterday Scott captured these values:
arc_meta_limit 10000000000
arc_meta_max 10000274368

and for me you go out of the arc memory for metadata during your mdtest.

Comment by Andrew Wagner [ 18/Jun/14 ]

Sorry about that, we were missing a 0 on the arc_meta_limit. We'll retest with the new values.

Comment by Scott Nolin [ 18/Jun/14 ]

I've completed an initial run with the zfs_arc_meta_limit and max set more approprately to 100G and 150G.

The mdtest data doesn't look any better, it's actually a bit worse. Complicating things, the filesystem is now in use by other jobs, not heavy use but a few hundred iops on various tasks it looks like (just watching jobstats in general). I'll post actual data if I can get another test done.

Regardless of that one, our other filesystem has more appropriate numbers to start with (it was left at default), and see a very similar difference in stat performance. So I don't really expect much...

Scott

Comment by Scott Nolin [ 18/Jun/14 ]

Here are results with more appropriate limits, included the graph with arc_meta_limit to show we're not exceeding it, and also a version with better scale.

Notice how things certainly aren't better, they're a little worse.

These graphs all have "95G" in the title. I wish I could embed images within comments to make it flow better.

Comment by Prakash Surya (Inactive) [ 08/Jul/14 ]

I don't mean to barge in so late to the party, but I'm curious what the status of this issue is? It's a little hard to follow the comments and the attached graphs. What's the observed performance degradation, and what configuration changes/experiments have been tried (and what were the results)?

Comment by Scott Nolin [ 08/Jul/14 ]

The observed performance degradation is ~45% lower stat IOPs in mdtest (8000 vs 14400) after the zfs/lustre upgrade.

We adjusted the arc_meta_limit (as it was set poorly) but it made no difference. The easiest graph to look at is this one: https://jira.hpdd.intel.com/secure/attachment/15192/mdtest-zfs063-95G-arc_meta_limit.png

The graphs are all for one particular filesystem, but we have a second file system with similar hardware and the same software versions and saw similar degradation in stat performance for mdtest.

If anyone runs mdtest just prior to upgrading lustre/zfs I'd be interested in their results, I suspect it will be similar. Our software is from the zfsonlinux repo with no additional patches.

Scott

Comment by Prakash Surya (Inactive) [ 09/Jul/14 ]

Thanks Scott. That definitely doesn't sit well with me.

Can you post the command you used as a test? Do you have the exact mdtest command/options you used? How many nodes? If I can get some time, I might try and reproduce this and see if I can better understand what's going on here. It's definitely not expected nor desired for the performance to drop like that; I want to get to the bottom of this.

Also, what's the Y axis label in the graph you linked to? I saw that earlier, but I can't make sense of it without labels. My initial interpretation was the Y axis is seconds, but that would mean lower is better, which doesn't agree with the claim of a performance decrease.

Comment by Prakash Surya (Inactive) [ 09/Jul/14 ]

Also, what's the Y axis label in the graph you linked to? I saw that earlier, but I can't make sense of it without labels. My initial interpretation was the Y axis is seconds, but that would mean lower is better, which doesn't agree with the claim of a performance decrease.

Actually, I think I got it now. The Y axis must be the rate of operations per second, which lines up with your claim of 14400 stat/s prior and 8000 stat/s now.

When you get a chance, please update us with the command used to generate the workload.

Comment by Scott Nolin [ 09/Jul/14 ]

Y-axis is IOPs.

The command info:

mdtest-1.9.1 was launched with 64 total task(s) on 4 node(s)
Command line used: /home/scottn/benchmarks/mdtest -i 2 -F -n 4000 -d /arcdata/scottn/mdtest

64 tasks, 256000 files

Scott

Comment by Scott Nolin [ 09/Jul/14 ]

I would also add, that while this absolute number from mdtest is worse, in use so far the upgrade has been an improvement. Performance doesn't seem to degrade so quickly with file creates, and things like interactive 'ls -l' are much better.

Scott

Comment by Prakash Surya (Inactive) [ 09/Jul/14 ]

I would also add, that while this absolute number from mdtest is worse, in use so far the upgrade has been an improvement. Performance doesn't seem to degrade so quickly with file creates, and things like interactive 'ls -l' are much better.

Glad to hear it!

I'm still a bit puzzled regarding the stat's though. I'm going to try and reproduce this using our test cluster; stay tuned.

Comment by Prakash Surya (Inactive) [ 09/Jul/14 ]

Interesting.. I think I see similar reduced performance with stats as well.. Hm..

So here's the mdtest output with releases based on lustre 2.4.2 and zfs 0.6.3 on the servers:

hype355@root:srun -- mdtest -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-1
-- started at 07/09/2014 13:09:02 --

mdtest-1.8.3 was launched with 64 total task(s) on 64 nodes
Command line used: /opt/mdtest-1.8.3/bin/mdtest -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-1
Path: /p/lcratery/surya1/LU-5212
FS: 1019.6 TiB   Used FS: 50.3%   Inodes: 866.3 Mi   Used Inodes: 60.2%

64 tasks, 256000 files

SUMMARY: (of 8 iterations)
   Operation                  Max        Min       Mean    Std Dev
   ---------                  ---        ---       ----    -------
   File creation     :   2046.438    838.703   1534.565    375.574
   File stat         :  65205.403  23577.494  57837.499  13089.055
   File removal      :   4780.471   4647.670   4719.076     45.088
   Tree creation     :    505.051     34.332    221.404    196.950
   Tree removal      :     12.423     10.049     11.123      0.763

-- finished at 07/09/2014 13:40:52 --

And here's the mdtest output with releases based on lustre 2.4.0 and zfs 0.6.2 on the servers:

hype355@root:srun -- mdtest -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-1
-- started at 07/09/2014 14:43:06 --

mdtest-1.8.3 was launched with 64 total task(s) on 64 nodes
Command line used: /opt/mdtest-1.8.3/bin/mdtest -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-1
Path: /p/lcratery/surya1/LU-5212
FS: 1019.6 TiB   Used FS: 50.3%   Inodes: 861.8 Mi   Used Inodes: 60.5%

64 tasks, 256000 files

SUMMARY: (of 8 iterations)
   Operation                  Max        Min       Mean    Std Dev
   ---------                  ---        ---       ----    -------
   File creation     :   1627.029    810.017   1320.848    239.655
   File stat         :  99560.417  69839.184  88798.194   9632.641
   File removal      :   4352.713   3279.728   4029.607    413.213
   Tree creation     :    348.675     33.174    194.944    141.913
   Tree removal      :     15.176     10.103     12.088      1.386

-- finished at 07/09/2014 15:19:02 --

Which shows about a 34% decrease in the mean "File stat" performance with the lustre 2.4.2 and zfs 0.6.3 release (I'm assuming the number reported is operations per second). That's no good.

Comment by Prakash Surya (Inactive) [ 15/Jul/14 ]

Scott, can you try increasing the `lu_cache_nr` module option and re-running the test?

# zwicky-lcy-mds1 /root > cat /sys/module/obdclass/parameters/lu_cache_nr
256

Try increasing it to something much larger, maybe 1M. I'd try that myself, but our testing resource is busy with other work at the moment.

Comment by Scott Nolin [ 16/Jul/14 ]

Prakash, we will give this a try soon.

Scott

Comment by Prakash Surya (Inactive) [ 16/Jul/14 ]

Scott, I was able to squeeze in a test run with lu_cache_nr=1048576 on the MDS and all OSS nodes in the filesystem. I didn't see any significant difference:

hype355@root:srun -- mdtest -v -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-5                                                                          
-- started at 07/16/2014 09:23:46 --

mdtest-1.8.3 was launched with 64 total task(s) on 64 nodes
Command line used: /opt/mdtest-1.8.3/bin/mdtest -v -i 8 -F -n 4000 -d /p/lcratery/surya1/LU-5212/mdtest-5
Path: /p/lcratery/surya1/LU-5212
FS: 1019.6 TiB   Used FS: 50.3%   Inodes: 834.8 Mi   Used Inodes: 62.5%

64 tasks, 256000 files

SUMMARY: (of 8 iterations)
   Operation                  Max        Min       Mean    Std Dev
   ---------                  ---        ---       ----    -------
   File creation     :   3060.802   2525.669   2719.003    161.410
   File stat         :  72310.501  32382.555  57016.755  11440.553
   File removal      :   4344.489   4043.991   4224.141     97.727
   Tree creation     :    377.644     32.784    147.864    126.958
   Tree removal      :     11.800      9.356     10.626      0.884

-- finished at 07/16/2014 09:45:06 --
Comment by Scott Nolin [ 16/Jul/14 ]

Prakash, thanks for letting me know. We won't bother running it then. Ours is a production cluster, we can typically run these tests though as it's not heavily used all the time, but it's not easy.

Scott

Comment by Peter Jones [ 05/Apr/18 ]

I imagine the performance is quite different on more current versions of Lustre and ZFS

Generated at Sat Feb 10 01:49:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.