Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
None
-
Lustre 2.4.0, Lustre 2.1.5
-
Four OSS blocks with 2 servers and a NEC SNA460 (AKA NetApp E5560) RAID with 8 OSTs each. CentOS 6.3 OS. Servers run Lustre 2.1.5. Testing with clients with Lustre 1.8.8, 2.1.5, 2.3.63.
-
3
-
7747
Description
Single stream write to a striped file has poor performance on Lustre 2.* clients. Discussion started on the hpdd-discuss mailing list (Subject: striped write performance in lustre 2.X?).
With Lustre 1.8.* (servers+clients) it was rather easy to exceed 1GB/s with a 4-8 striped file. With Lustre 2.1.5 servers again the Lustre 1.8.8 client is the fastest option (though not as fast as it used to be). Comparing Lustre 1.8.8 client run with a 2.3.63ish client including some tunings:
1.8.8 client:
# dd if=/dev/zero of=/mnt/lnec/pool4/stripe_4_4096/test256gbx bs=4M count=$((256/4*1000)) 268435456000 bytes (268 GB) copied, 349.921 s, 767 MB/s
2.3.63 client:
# dd if=/dev/zero of=/mnt/lnec/pool4/stripe_4_4096/test256gby bs=4M count=$((256/4*1000)) 268435456000 bytes (268 GB) copied, 689.333 s, 389 MB/s
1.8.8 client rpc_stats (part):
(typical OST rpcs in flight for full run)
read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 0 0 0 | 11594 18 18 1: 0 0 0 | 13130 20 38 2: 0 0 0 | 16592 25 64 3: 0 0 0 | 16543 25 90 4: 0 0 0 | 3762 5 96 5: 0 0 0 | 1598 2 98 6: 0 0 0 | 338 0 99 7: 0 0 0 | 401 0 99 8: 0 0 0 | 45 0 99 9: 0 0 0 | 1 0 100 read write offset rpcs % cum % | rpcs % cum % 0: 0 0 0 | 64000 99 99 1: 0 0 0 | 0 0 99 2: 0 0 0 | 0 0 99 4: 0 0 0 | 0 0 99 8: 0 0 0 | 0 0 99 16: 0 0 0 | 0 0 99 32: 0 0 0 | 0 0 99 64: 0 0 0 | 0 0 99 128: 0 0 0 | 4 0 100 1.8.8 client perf top: 2151.00 9.3% ll_ra_read_init [lustre] 1781.00 7.7% copy_user_generic_string [kernel.kallsyms] 1734.00 7.5% _spin_lock [kernel.kallsyms] 1300.00 5.6% lov_putref [lov] 781.00 3.4% radix_tree_tag_clear [kernel.kallsyms] 720.00 3.1% _spin_lock_irqsave [kernel.kallsyms] 667.00 2.9% intel_idle [kernel.kallsyms] 607.00 2.6% osc_destroy [osc] 554.00 2.4% __clear_user [kernel.kallsyms] 551.00 2.4% refresh_entry [lvfs] 451.00 2.0% kiblnd_ni_fini_pools [ko2iblnd] 399.00 1.7% radix_tree_tag_set [kernel.kallsyms] 356.00 1.5% __percpu_counter_add [kernel.kallsyms] 352.00 1.5% lov_queue_group_io [lov] 340.00 1.5% mark_page_accessed [kernel.kallsyms] 338.00 1.5% _spin_lock_irq [kernel.kallsyms] 307.00 1.3% test_clear_page_writeback [kernel.kallsyms] 306.00 1.3% lov_stripe_unlock [lov] 302.00 1.3% _spin_unlock_irqrestore [kernel.kallsyms] 280.00 1.2% osc_update_grant [osc] 272.00 1.2% __dec_zone_state [kernel.kallsyms] 221.00 1.0% lop_makes_rpc [osc]
2.3.63 client rpc_stats
(typical OST rpcs in flight stats from rpc_stats)
read write rpcs in flight rpcs % cum % | rpcs % cum % 0: 0 0 0 | 0 0 0 1: 0 0 0 | 16030 25 25 2: 0 0 0 | 16516 25 50 3: 0 0 0 | 23525 36 87 4: 0 0 0 | 7909 12 99 5: 0 0 0 | 7 0 99 6: 0 0 0 | 5 0 99 7: 0 0 0 | 4 0 99 8: 0 0 0 | 4 0 100 read write offset rpcs % cum % | rpcs % cum % 0: 0 0 0 | 1 0 0 1: 0 0 0 | 0 0 0 2: 0 0 0 | 0 0 0 4: 0 0 0 | 0 0 0 8: 0 0 0 | 0 0 0 16: 0 0 0 | 0 0 0 32: 0 0 0 | 0 0 0 64: 0 0 0 | 0 0 0 128: 0 0 0 | 0 0 0 256: 0 0 0 | 1 0 0 512: 0 0 0 | 2 0 0 1024: 0 0 0 | 4 0 0 2048: 0 0 0 | 8 0 0 4096: 0 0 0 | 16 0 0 8192: 0 0 0 | 32 0 0 16384: 0 0 0 | 64 0 0 32768: 0 0 0 | 128 0 0 65536: 0 0 0 | 256 0 0 131072: 0 0 0 | 512 0 1 262144: 0 0 0 | 1024 1 3 524288: 0 0 0 | 2048 3 6 1048576: 0 0 0 | 4096 6 12 2097152: 0 0 0 | 8192 12 25 4194304: 0 0 0 | 16384 25 51 8388608: 0 0 0 | 31232 48 100 2.3.63 client perf top (up to 1%): 5.03% [kernel] [k] copy_user_generic_string 3.79% [obdclass] [k] key_fini 3.78% [kernel] [k] _spin_lock 2.43% [obdclass] [k] keys_fill 2.13% [kernel] [k] kmem_cache_alloc 1.99% [kernel] [k] __clear_user 1.92% [kernel] [k] kmem_cache_free 1.78% [lvfs] [k] lprocfs_counter_add 1.48% [kernel] [k] kfree 1.34% [kernel] [k] __kmalloc 1.33% [kernel] [k] radix_tree_delete 1.31% [kernel] [k] _spin_lock_irqsave 1.28% [obdclass] [k] cl_page_put 1.15% [kernel] [k] radix_tree_insert 1.13% [lvfs] [k] lprocfs_counter_sub 1.04% [osc] [k] osc_lru_del 1.02% [obdclass] [k] cl_page_find0 1.01% [obdclass] [k] lu_context_key_get
Upon request from Jinshan I ran 'collectl -scml' during the dd:
#### 2.3.63 client: [root@sabi22 ~]# collectl -scml -i 15 waiting for 15 second sample... #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client--------> #cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrite Writes 0 0 78 192 60G 112K 1G 43M 170M 80M 0 0 0 0 2 2 2578 10780 53G 112K 7G 5G 1G 85M 0 0 392124 96 5 5 4639 20295 39G 112K 17G 16G 4G 85M 0 0 741103 181 4 4 4000 16842 31G 112K 24G 22G 6G 85M 0 0 609758 149 4 4 3046 11499 31G 112K 24G 22G 6G 85M 0 0 411785 101 4 4 3060 11592 31G 112K 24G 22G 6G 85M 0 0 412058 101 4 4 3096 11445 31G 112K 24G 23G 6G 85M 0 0 405231 99 4 4 3080 11440 31G 112K 24G 23G 6G 85M 0 0 404958 99 4 4 3052 11386 30G 112K 24G 23G 6G 85M 0 0 405777 99 4 4 3066 11477 30G 112K 24G 23G 6G 85M 0 0 407689 100 4 4 3044 11552 30G 112K 25G 23G 6G 85M 0 0 412058 101 4 4 3058 11382 30G 112K 25G 23G 6G 85M 0 0 403319 98 4 4 3095 11412 30G 112K 25G 23G 6G 85M 0 0 404958 99 4 4 3067 11370 30G 112K 25G 24G 6G 85M 0 0 403593 99 4 4 3083 11478 29G 112K 25G 24G 6G 85M 0 0 407689 100 4 4 3107 11374 29G 112K 25G 24G 6G 85M 0 0 401135 98 4 4 3094 11280 29G 112K 25G 24G 6G 85M 0 0 396766 97 4 4 3078 11282 29G 112K 25G 24G 6G 85M 0 0 398404 97 4 4 3098 11241 29G 112K 26G 24G 6G 85M 0 0 396493 97 4 4 3067 11238 28G 112K 26G 24G 6G 85M 0 0 393762 96 4 4 3080 11513 28G 112K 26G 24G 6G 85M 0 0 395947 97 4 4 3068 11327 28G 112K 26G 24G 6G 85M 0 0 394854 96 #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client--------> #cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrite Writes 4 4 3069 11444 28G 112K 26G 25G 6G 85M 0 0 396766 97 4 4 3060 11337 28G 112K 26G 25G 7G 85M 0 0 392397 96 4 4 3050 11220 28G 112K 26G 25G 7G 85M 0 0 388301 95 4 4 3076 11355 28G 112K 26G 25G 7G 85M 0 0 394854 96 4 4 3096 11359 27G 112K 27G 25G 7G 85M 0 0 394308 96 4 4 3049 11316 27G 112K 27G 25G 7G 85M 0 0 394854 96 4 4 3060 11325 27G 112K 27G 25G 7G 85M 0 0 396493 97 4 4 3058 11381 27G 112K 27G 25G 7G 85M 0 0 396220 97 4 4 3071 11343 27G 112K 27G 26G 7G 85M 0 0 395127 96 4 4 3015 11213 27G 112K 27G 26G 7G 85M 0 0 390485 95 4 4 3046 11328 26G 112K 27G 26G 7G 85M 0 0 394308 96 4 4 3069 11454 26G 112K 27G 26G 7G 85M 0 0 397858 97 4 4 3058 11524 26G 112K 28G 26G 7G 85M 0 0 399497 98 3 3 3033 11427 26G 112K 28G 26G 7G 85M 0 0 397858 97 3 3 3048 11500 26G 112K 28G 26G 7G 85M 0 0 402227 98 4 4 3056 11291 26G 112K 28G 26G 7G 85M 0 0 395127 96 4 4 3067 11223 25G 112K 28G 27G 7G 85M 0 0 394035 96 3 3 3055 11366 25G 112K 28G 27G 7G 85M 0 0 398404 97 4 4 3036 11337 25G 112K 28G 27G 7G 85M 0 0 397585 97 4 4 3057 11307 25G 112K 28G 27G 7G 85M 0 0 395947 97 4 4 3080 11241 25G 112K 29G 27G 7G 85M 0 0 393489 96 1 1 1392 5003 25G 112K 29G 27G 7G 80M 0 0 170667 42 #<----CPU[HYPER]-----><-----------Memory-----------><--------Lustre Client--------> #cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrite Writes 0 0 101 216 25G 112K 29G 27G 7G 80M 0 0 0 0
Doesn't seem to fill the memory...
Further details:
[root@sabi22 ~]# lctl get_param llite.*.max_cached_mb llite.lnec-ffff8808348d2c00.max_cached_mb= users: 32 max_cached_mb: 48397 used_mb: 24610 unused_mb: 23787 reclaim_count: 0 [root@sabi22 ~]# lctl get_param osc.*.max_dirty_mb osc.lnec-OST0000-osc-ffff8808348d2c00.max_dirty_mb=32 [...] osc.lnec-OST0018-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST0019-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001a-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001b-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001c-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001d-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001e-osc-ffff8808348d2c00.max_dirty_mb=32 osc.lnec-OST001f-osc-ffff8808348d2c00.max_dirty_mb=32 [root@sabi22 EF]# lfs getstripe /mnt/lnec/pool4/stripe_4_4096/test256a /mnt/lnec/pool4/stripe_4_4096/test256a lmm_stripe_count: 4 lmm_stripe_size: 4194304 lmm_layout_gen: 0 lmm_stripe_offset: 27 lmm_pool: osb4 obdidx objid objid group 27 6792 0x1a88 0 31 5891 0x1703 0 24 6983 0x1b47 0 28 5251 0x1483 0
> Looks like we will not make any further progress on this issue.
That's a shame. I don't think that "single stream write to striped file is slow with 2.x clients" (compared to 1.8.x) is still yet fixed in 2.6.0. Or is it simply that other tickets track that issue now?