Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
None
-
Lustre 2.4.0, Lustre 2.1.5
-
Four OSS blocks with 2 servers and a NEC SNA460 (AKA NetApp E5560) RAID with 8 OSTs each. CentOS 6.3 OS. Servers run Lustre 2.1.5. Testing with clients with Lustre 1.8.8, 2.1.5, 2.3.63.
-
3
-
7747
Description
Single stream write to a striped file has poor performance on Lustre 2.* clients. Discussion started on the hpdd-discuss mailing list (Subject: striped write performance in lustre 2.X?).
With Lustre 1.8.* (servers+clients) it was rather easy to exceed 1GB/s with a 4-8 striped file. With Lustre 2.1.5 servers again the Lustre 1.8.8 client is the fastest option (though not as fast as it used to be). Comparing Lustre 1.8.8 client run with a 2.3.63ish client including some tunings:
1.8.8 client:
# dd if=/dev/zero of=/mnt/lnec/pool4/stripe_4_4096/test256gbx bs=4M count=$((256/4*1000)) 268435456000 bytes (268 GB) copied, 349.921 s, 767 MB/s
2.3.63 client:
# dd if=/dev/zero of=/mnt/lnec/pool4/stripe_4_4096/test256gby bs=4M count=$((256/4*1000)) 268435456000 bytes (268 GB) copied, 689.333 s, 389 MB/s
1.8.8 client rpc_stats (part):
(typical OST rpcs in flight for full run)
read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 0 0 0 | 11594 18 18 1: 0 0 0 | 13130 20 38 2: 0 0 0 | 16592 25 64 3: 0 0 0 | 16543 25 90 4: 0 0 0 | 3762 5 96 5: 0 0 0 | 1598 2 98 6: 0 0 0 | 338 0 99 7: 0 0 0 | 401 0 99 8: 0 0 0 | 45 0 99 9: 0 0 0 | 1 0 100 read write offset rpcs % cum % | rpcs % cum % 0: 0 0 0 | 64000 99 99 1: 0 0 0 | 0 0 99 2: 0 0 0 | 0 0 99 4: 0 0 0 | 0 0 99 8: 0 0 0 | 0 0 99 16: 0 0 0 | 0 0 99 32: 0 0 0 | 0 0 99 64: 0 0 0 | 0 0 99 128: 0 0 0 | 4 0 100 1.8.8 client perf top: 2151.00 9.3% ll_ra_read_init [lustre] 1781.00 7.7% copy_user_generic_string [kernel.kallsyms] 1734.00 7.5% _spin_lock [kernel.kallsyms] 1300.00 5.6% lov_putref [lov] 781.00 3.4% radix_tree_tag_clear [kernel.kallsyms] 720.00 3.1% _spin_lock_irqsave [kernel.kallsyms] 667.00 2.9% intel_idle [kernel.kallsyms] 607.00 2.6% osc_destroy [osc] 554.00 2.4% __clear_user [kernel.kallsyms] 551.00 2.4% refresh_entry [lvfs] 451.00 2.0% kiblnd_ni_fini_pools [ko2iblnd] 399.00 1.7% radix_tree_tag_set [kernel.kallsyms] 356.00 1.5% __percpu_counter_add [kernel.kallsyms] 352.00 1.5% lov_queue_group_io [lov] 340.00 1.5% mark_page_accessed [kernel.kallsyms] 338.00 1.5% _spin_lock_irq [kernel.kallsyms] 307.00 1.3% test_clear_page_writeback [kernel.kallsyms] 306.00 1.3% lov_stripe_unlock [lov] 302.00 1.3% _spin_unlock_irqrestore [kernel.kallsyms] 280.00 1.2% osc_update_grant [osc] 272.00 1.2% __dec_zone_state [kernel.kallsyms] 221.00 1.0% lop_makes_rpc [osc]
2.3.63 client rpc_stats
(typical OST rpcs in flight stats from rpc_stats)
read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 0 0 0 | 0 0 0 1: 0 0 0 | 16030 25 25 2: 0 0 0 | 16516 25 50 3: 0 0 0 | 23525 36 87 4: 0 0 0 | 7909 12 99 5: 0 0 0 | 7 0 99 6: 0 0 0 | 5 0 99 7: 0 0 0 | 4 0 99 8: 0 0 0 | 4 0 100 read write offset rpcs % cum % | rpcs % cum % 0: 0 0 0 | 1 0 0 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 0 0 0 | 0 0 0 128: 0 0 0 | 0 0 0 256: 0 0 0 | 1 0 0 512: 0 0 0 | 2 0 0 1024: 0 0 0 | 4 0 0 2048: 0 0 0 | 8 0 0 4096: 0 0 0 | 16 0 0 8192: 0 0 0 | 32 0 0 16384: 0 0 0 | 64 0 0 32768: 0 0 0 | 128 0 0 65536: 0 0 0 | 256 0 0 131072: 0 0 0 | 512 0 1 262144: 0 0 0 | 1024 1 3 524288: 0 0 0 | 2048 3 6 1048576: 0 0 0 | 4096 6 12 2097152: 0 0 0 | 8192 12 25 4194304: 0 0 0 | 16384 25 51 8388608: 0 0 0 | 31232 48 100 2.3.63 client perf top (up to 1%): 5.03% [kernel] [k] copy_user_generic_string 3.79% [obdclass] [k] key_fini 3.78% [kernel] [k] _spin_lock 2.43% [obdclass] [k] keys_fill 2.13% [kernel] [k] kmem_cache_alloc 1.99% [kernel] [k] __clear_user 1.92% [kernel] [k] kmem_cache_free 1.78% [lvfs] [k] lprocfs_counter_add 1.48% [kernel] [k] kfree 1.34% [kernel] [k] __kmalloc 1.33% [kernel] [k] radix_tree_delete 1.31% [kernel] [k] _spin_lock_irqsave 1.28% [obdclass] [k] cl_page_put 1.15% [kernel] [k] radix_tree_insert 1.13% [lvfs] [k] lprocfs_counter_sub 1.04% [osc] [k] osc_lru_del 1.02% [obdclass] [k] cl_page_find0 1.01% [obdclass] [k] lu_context_key_get
Upon request from Jinshan I ran 'collectl -scml' during the dd:
#### 2.3.63 client: [root@sabi22 ~]# collectl -scml -i 15 waiting for 15 second sample... #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client--------> #cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrite Writes 0 0 78 192 60G 112K 1G 43M 170M 80M 0 0 0 0 2 2 2578 10780 53G 112K 7G 5G 1G 85M 0 0 392124 96 5 5 4639 20295 39G 112K 17G 16G 4G 85M 0 0 741103 181 4 4 4000 16842 31G 112K 24G 22G 6G 85M 0 0 609758 149 4 4 3046 11499 31G 112K 24G 22G 6G 85M 0 0 411785 101 4 4 3060 11592 31G 112K 24G 22G 6G 85M 0 0 412058 101 4 4 3096 11445 31G 112K 24G 23G 6G 85M 0 0 405231 99 4 4 3080 11440 31G 112K 24G 23G 6G 85M 0 0 404958 99 4 4 3052 11386 30G 112K 24G 23G 6G 85M 0 0 405777 99 4 4 3066 11477 30G 112K 24G 23G 6G 85M 0 0 407689 100 4 4 3044 11552 30G 112K 25G 23G 6G 85M 0 0 412058 101 4 4 3058 11382 30G 112K 25G 23G 6G 85M 0 0 403319 98 4 4 3095 11412 30G 112K 25G 23G 6G 85M 0 0 404958 99 4 4 3067 11370 30G 112K 25G 24G 6G 85M 0 0 403593 99 4 4 3083 11478 29G 112K 25G 24G 6G 85M 0 0 407689 100 4 4 3107 11374 29G 112K 25G 24G 6G 85M 0 0 401135 98 4 4 3094 11280 29G 112K 25G 24G 6G 85M 0 0 396766 97 4 4 3078 11282 29G 112K 25G 24G 6G 85M 0 0 398404 97 4 4 3098 11241 29G 112K 26G 24G 6G 85M 0 0 396493 97 4 4 3067 11238 28G 112K 26G 24G 6G 85M 0 0 393762 96 4 4 3080 11513 28G 112K 26G 24G 6G 85M 0 0 395947 97 4 4 3068 11327 28G 112K 26G 24G 6G 85M 0 0 394854 96 #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client--------> #cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrite Writes 4 4 3069 11444 28G 112K 26G 25G 6G 85M 0 0 396766 97 4 4 3060 11337 28G 112K 26G 25G 7G 85M 0 0 392397 96 4 4 3050 11220 28G 112K 26G 25G 7G 85M 0 0 388301 95 4 4 3076 11355 28G 112K 26G 25G 7G 85M 0 0 394854 96 4 4 3096 11359 27G 112K 27G 25G 7G 85M 0 0 394308 96 4 4 3049 11316 27G 112K 27G 25G 7G 85M 0 0 394854 96 4 4 3060 11325 27G 112K 27G 25G 7G 85M 0 0 396493 97 4 4 3058 11381 27G 112K 27G 25G 7G 85M 0 0 396220 97 4 4 3071 11343 27G 112K 27G 26G 7G 85M 0 0 395127 96 4 4 3015 11213 27G 112K 27G 26G 7G 85M 0 0 390485 95 4 4 3046 11328 26G 112K 27G 26G 7G 85M 0 0 394308 96 4 4 3069 11454 26G 112K 27G 26G 7G 85M 0 0 397858 97 4 4 3058 11524 26G 112K 28G 26G 7G 85M 0 0 399497 98 3 3 3033 11427 26G 112K 28G 26G 7G 85M 0 0 397858 97 3 3 3048 11500 26G 112K 28G 26G 7G 85M 0 0 402227 98 4 4 3056 11291 26G 112K 28G 26G 7G 85M 0 0 395127 96 4 4 3067 11223 25G 112K 28G 27G 7G 85M 0 0 394035 96 3 3 3055 11366 25G 112K 28G 27G 7G 85M 0 0 398404 97 4 4 3036 11337 25G 112K 28G 27G 7G 85M 0 0 397585 97 4 4 3057 11307 25G 112K 28G 27G 7G 85M 0 0 395947 97 4 4 3080 11241 25G 112K 29G 27G 7G 85M 0 0 393489 96 1 1 1392 5003 25G 112K 29G 27G 7G 80M 0 0 170667 42 #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client--------> #cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrite Writes 0 0 101 216 25G 112K 29G 27G 7G 80M 0 0 0 0
Doesn't seem to fill the memory...
Further details:
[root@sabi22 ~]# lctl get_param llite.*.max_cached_mb llite.lnec-ffff8808348d2c00.max_cached_mb= users: 32 max_cached_mb: 48397 used_mb: 24610 unused_mb: 23787 reclaim_count: 0 [root@sabi22 ~]# lctl get_param osc.*.max_dirty_mb osc.lnec-OST0000-osc-ffff8808348d2c00.max_dirty_mb=32 [...] osc.lnec-OST0018-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST0019-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001a-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001b-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001c-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001d-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001e-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001f-osc-ffff8808348d2c00.max_dirty_mb=32 [root@sabi22 EF]# lfs getstripe /mnt/lnec/pool4/stripe_4_4096/test256a /mnt/lnec/pool4/stripe_4_4096/test256a lmm_stripe_count: 4 lmm_stripe_size: 4194304 lmm_layout_gen: 0 lmm_stripe_offset: 27 lmm_pool: osb4 obdidx objid objid group 27 6792 0x1a88 0 31 5891 0x1703 0 24 6983 0x1b47 0 28 5251 0x1483 0