[LU-1669] lli->lli_write_mutex (single shared file performance) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.7.0
Affects Version/s: Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0
Labels:
- client
- llnl
- performance
Environment:
Chaos 5, single client

Severity:
3
Epic:
- client
Rank (Obsolete):
4056

Description

Shared file writes on a single client from many threads performs significantly worse with the CLIO code.

However, unlike issue ~~LU-1666~~ the root cause for this appears to be the lli->lli_write_mutex which is taken in the common write path (see ll_file_io_generic). This lock effectively serializes all writes to the file and drastically limits the rate at which we can dirty pages. It's so bad that according to /proc/meminfo we're unable to dirty pages in the cache faster than they are written to the server. That is the total number of dirty pages on the client rarely exceeds a few megabytes.

Once again this issue can be easily reproduced with the following fio script run on a single client.

fio ssf.fio

; ssf.fio, single shared file
[global]
directory=/p/lstest/behlendo/fio/
filename=ssf
size=256g
blocksize=1m
direct=0
ioengine=sync
numjobs=128
group_reporting=1
fallocate=none
runtime=60
stonewall

[write]
rw=randwrite:256
rw_sequencer=sequential
fsync_on_close=1

[read]
rw=randread:256
rw_sequencer=sequential

The following are a sampling of stacks from the system showing all the IO threads contending for this mutex. In addition, we can clearly see process 33296 holding the mutex while potentially blocking briefly in osc_enter_cache(). It's still unclear to me if we can simply drastically speed up cl_io_loop() to improve this, or if we must reintroduce some range locking as was done in 1.8.

fio           S 0000000000000008     0 33296  33207 0x00000000
 ffff880522555548 0000000000000082 ffff880522555510 ffff88052255550c
 ffff8807ff75a8c8 ffff88083fe82000 ffff880044635fc0 00000000000007ff
 ffff8807c6633af8 ffff880522555fd8 000000000000f4e8 ffff8807c6633af8
Call Trace:
 [<ffffffffa033467e>] cfs_waitq_wait+0xe/0x10 [libcfs]
 [<ffffffffa0a3b046>] osc_enter_cache+0x676/0x9c0 [osc]
 [<ffffffffa0aa3433>] ? lov_merge_lvb_kms+0x113/0x260 [lov]
 [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
 [<ffffffffa0aafc4e>] ? lov_attr_get+0x1e/0x60 [lov]
 [<ffffffffa0a41e63>] osc_queue_async_io+0xe53/0x1430 [osc]
 [<ffffffff81274a1c>] ? put_dec+0x10c/0x110
 [<ffffffff81274d0e>] ? number+0x2ee/0x320
 [<ffffffffa0a29f3e>] ? osc_page_init+0x29e/0x5b0 [osc]
 [<ffffffff81277206>] ? vsnprintf+0x2b6/0x5f0
 [<ffffffffa0a28e24>] osc_page_cache_add+0x94/0x1a0 [osc]
 [<ffffffffa06c45bf>] cl_page_cache_add+0x7f/0x1e0 [obdclass]
 [<ffffffffa0ab2edf>] lov_page_cache_add+0x8f/0x190 [lov]
 [<ffffffffa06c45bf>] cl_page_cache_add+0x7f/0x1e0 [obdclass]
 [<ffffffff8112672d>] ? __set_page_dirty_nobuffers+0xdd/0x160
 [<ffffffffa0b78c43>] vvp_io_commit_write+0x313/0x4e0 [lustre]
 [<ffffffffa06d0d8d>] cl_io_commit_write+0xbd/0x1b0 [obdclass]
 [<ffffffffa0b5465b>] ll_commit_write+0xfb/0x320 [lustre]
 [<ffffffffa0b68660>] ll_write_end+0x30/0x60 [lustre]
 [<ffffffff81111684>] generic_file_buffered_write+0x174/0x2a0
 [<ffffffff810708d7>] ? current_fs_time+0x27/0x30
 [<ffffffff81112f70>] __generic_file_aio_write+0x250/0x480
 [<ffffffffa06c14c5>] ? cl_env_info+0x15/0x20 [obdclass]
 [<ffffffff8111320f>] generic_file_aio_write+0x6f/0xe0
 [<ffffffffa0b793a0>] vvp_io_write_start+0xb0/0x1e0 [lustre]
 [<ffffffffa06ce392>] cl_io_start+0x72/0xf0 [obdclass]
 [<ffffffffa06d1ca4>] cl_io_loop+0xb4/0x160 [obdclass]
 [<ffffffffa0b33b0e>] ll_file_io_generic+0x3be/0x4f0 [lustre]
 [<ffffffffa0b33d8c>] ll_file_aio_write+0x14c/0x290 [lustre]
 [<ffffffffa0b341e3>] ll_file_write+0x173/0x270 [lustre]
 [<ffffffff81177a98>] vfs_write+0xb8/0x1a0
 [<ffffffff811784a1>] sys_write+0x51/0x90
 [<ffffffff814f21be>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

fio           S 0000000000000001     0 33297  33207 0x00000000
 ffff8807d4b91d18 0000000000000086 ffff8807d4b91c98 ffffffffa0aba783
 ffff880c5cca92f0 ffff880f9a8c7428 ffffffff8100bc0e ffff8807d4b91d18
 ffff8807c6633098 ffff8807d4b91fd8 000000000000f4e8 ffff8807c6633098
Call Trace:
 [<ffffffffa0aba783>] ? lov_io_init_raid0+0x383/0x780 [lov]
 [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8104d93b>] ? mutex_spin_on_owner+0x9b/0xc0
 [<ffffffff814f04bb>] __mutex_lock_interruptible_slowpath+0x14b/0x1b0
 [<ffffffff814f0568>] mutex_lock_interruptible+0x48/0x50
 [<ffffffffa0b33aff>] ll_file_io_generic+0x3af/0x4f0 [lustre]
 [<ffffffffa0b33d8c>] ll_file_aio_write+0x14c/0x290 [lustre]
 [<ffffffffa0b341e3>] ll_file_write+0x173/0x270 [lustre]
 [<ffffffff81177a98>] vfs_write+0xb8/0x1a0
 [<ffffffff811784a1>] sys_write+0x51/0x90
 [<ffffffff814f21be>] ? do_device_not_available+0xe/0x10
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
...many threads like 33297...

Finally, it's my understanding that this lock is solely here to enforce atomicity between threads on the client. If that is in fact the case we may do a little experimenting with the group lock to see if that's a reasonable short term workaround for some of our codes.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

ext4-dd-1M-ftrace.log.bz2
351 kB
03/Jun/13 9:38 PM
lustre-dd-1M-ftrace.log.bz2
575 kB
03/Jun/13 9:38 PM
Range Lock for Write.docx
126 kB
10/Jun/13 10:25 PM

Issue Links

is blocked by

LU-3261 Threads stuck in osc_enter_cache when testing LU-1669

Resolved

is related to

LU-5700 async IO LBUG obj->cob_transient_pages == 0

Resolved

LU-8025 ll_direct_io code introduced in LU-6260 is based on 2.4/2.5, ignores LU-1669

Resolved

LU-5360 Test timeout sanity-hsm test_13

Resolved

is related to

LU-11085 Replace Lustre interval tree with kernel one

Resolved

Sub-Tasks

Progress

1.

Test timeout sanity-hsm test_13

Resolved

WC Triage

lli->lli_write_mutex (single shared file performance)

Details

Description

Attachments

Attachments

Issue Links

Sub-Tasks

Activity

People

Dates