Details
-
Bug
-
Resolution: Fixed
-
Major
-
Lustre 2.1.3
-
lustre 2.1.3
kernel 2.6.32-220
ofed 1.5.4
bullxlinux 6.2 (based on redhat 6.2)
-
3
-
5822
Description
One of our customer (TGCC - lustre 2.1.3) is facing a performance issue while reading in the same file from many tasks on one client at the same time. Performance drops while increasing the number of tasks.
I have reproduced the issue and obtained these results:
tasks Mean Total (MB/s) dd 1 339 339 dd 2 107 214 dd 4 78 312 dd 8 64 512 dd 16 26 428
Applying the patch http://review.whamcloud.com/#change,3627 on lustre 2.1.3 gives no real improvement.
tasks Mean Total (MB/s) dd 1 542 542 dd 2 118 236 dd 4 74 298 dd 8 73 591 dd 16 52 833
The profiling shows contention in CLIO page routines.
(I suppose the libpython entry comes from the mpirun synchronization of the 16 tasks)
samples % linenr info app name symbol name 1149893 34.8489 (no location information) libpython2.6.so.1.0 /usr/lib64/libpython2.6.so.1.0 338676 10.2640 dec_and_lock.c:21 vmlinux _atomic_dec_and_lock 335241 10.1599 cl_page.c:404 obdclass.ko cl_page_find0 293448 8.8933 cl_page.c:729 obdclass.ko cl_vmpage_page 235856 7.1479 filemap.c:963 vmlinux grab_cache_page_nowait 142686 4.3243 cl_page.c:661 obdclass.ko cl_page_put 80065 2.4265 filemap.c:667 vmlinux find_get_page 46623 1.4130 copy_user_64.S:240 vmlinux copy_user_generic_string 34127 1.0343 (no location information) libpthread-2.12.so pthread_rwlock_rdlock 33685 1.0209 (no location information) libc-2.12.so getenv 31725 0.9615 (no location information) libpthread-2.12.so pthread_rwlock_unlock 30492 0.9241 swap.c:288 vmlinux activate_page 28528 0.8646 (no location information) libc-2.12.so __dcigettext 23703 0.7183 (no location information) libc-2.12.so __strlen_sse42 22345 0.6772 vvp_page.c:120 lustre.ko vvp_page_unassume 20646 0.6257 swap.c:183 vmlinux put_page 18525 0.5614 intel_idle.c:215 vmlinux intel_idle 12774 0.3871 open.c:961 vmlinux sys_close 12702 0.3849 filemap.c:579 vmlinux unlock_page 12685 0.3844 (no location information) libc-2.12.so __memcpy_ssse3 12134 0.3677 cl_page.c:874 obdclass.ko cl_page_invoid 11734 0.3556 (no location information) libpthread-2.12.so __close_nocancel 10573 0.3204 cl_page.c:884 obdclass.ko cl_page_owner_clear 10510 0.3185 (no location information) libc-2.12.so strerror_r 9783 0.2965 cl_page.c:1068 obdclass.ko cl_page_unassume 9472 0.2871 (no location information) oprofiled /usr/bin/oprofiled 9438 0.2860 (no location information) libc-2.12.so __strncmp_sse2 8826 0.2675 rw.c:724 lustre.ko ll_readahead 8769 0.2658 filemap.c:527 vmlinux page_waitqueue 8095 0.2453 entry_64.S:470 vmlinux system_call_after_swapgs 7756 0.2351 cl_page.c:561 obdclass.ko cl_page_state_set0 7615 0.2308 (no location information) libc-2.12.so __stpcpy_sse2 7366 0.2232 entry_64.S:462 vmlinux system_call 7097 0.2151 sched.c:4430 vmlinux find_busiest_group 6898 0.2091 (no location information) libperl.so /usr/lib64/perl5/CORE/libperl.so 6887 0.2087 (no location information) libc-2.12.so __ctype_b_loc 6780 0.2055 wait.c:251 vmlinux __wake_up_bit 6631 0.2010 cl_page.c:139 obdclass.ko cl_page_at_trusted 6588 0.1997 (no location information) libc-2.12.so __strlen_sse2 6336 0.1920 lov_io.c:239 lov.ko lov_page_stripe 5289 0.1603 lov_io.c:208 lov.ko lov_sub_get 5263 0.1595 ring_buffer.c:2834 vmlinux rb_get_reader_page 4595 0.1393 lvfs_lib.c:76 lvfs.ko lprocfs_counter_add 4373 0.1325 (no location information) libc-2.12.so strerror 4347 0.1317 lov_io.c:252 lov.ko lov_page_subio 4337 0.1314 cl_page.c:1036 obdclass.ko cl_page_assume 4310 0.1306 (no location information) libc-2.12.so __mempcpy_sse2 3903 0.1183 lov_page.c:92 lov.ko lov_page_own 3866 0.1172 mutex.c:409 vmlinux __mutex_lock_slowpath 3841 0.1164 lcommon_cl.c:1097 lustre.ko cl2vm_page 3698 0.1121 radix-tree.c:414 vmlinux radix_tree_lookup_slot 3506 0.1063 vvp_page.c:109 lustre.ko vvp_page_assume 3491 0.1058 lu_object.c:1115 obdclass.ko lu_object_locate 3490 0.1058 (no location information) ipmi_si.ko port_inb 3457 0.1048 ring_buffer.c:3221 vmlinux ring_buffer_consume 3431 0.1040 cl_page.c:898 obdclass.ko cl_page_owner_set ...
I would have expected the per-task performance to maintain at the level (or slightly lower) of the one-reader case. It's what I observe on ext4 for instance.
I can easily reproduce the issue and can provide additional information (lustre traces, system traces, system dump, profiling, ...) if needed.
Please, note that this issue is serious for this customer because its cluster has lustre-client nodes with many cores (up to 128 cores) and this impacts dramatically applications run time.
Attachments
Issue Links
- is related to
-
LU-1666 hdr->coh_page_guard contention (single shared file performance)
- Resolved