Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1669

lli->lli_write_mutex (single shared file performance)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Lustre 2.7.0
    • Lustre 2.4.0, Lustre 2.5.0, Lustre 2.6.0
    • Chaos 5, single client
    • 3
    • 4056

    Description

      Shared file writes on a single client from many threads performs significantly worse with the CLIO code.

      However, unlike issue LU-1666 the root cause for this appears to be the lli->lli_write_mutex which is taken in the common write path (see ll_file_io_generic). This lock effectively serializes all writes to the file and drastically limits the rate at which we can dirty pages. It's so bad that according to /proc/meminfo we're unable to dirty pages in the cache faster than they are written to the server. That is the total number of dirty pages on the client rarely exceeds a few megabytes.

      Once again this issue can be easily reproduced with the following fio script run on a single client.

      fio ssf.fio
      
      ; ssf.fio, single shared file
      [global]
      directory=/p/lstest/behlendo/fio/
      filename=ssf
      size=256g
      blocksize=1m
      direct=0
      ioengine=sync
      numjobs=128
      group_reporting=1
      fallocate=none
      runtime=60
      stonewall
      
      [write]
      rw=randwrite:256
      rw_sequencer=sequential
      fsync_on_close=1
      
      [read]
      rw=randread:256
      rw_sequencer=sequential
      

      The following are a sampling of stacks from the system showing all the IO threads contending for this mutex. In addition, we can clearly see process 33296 holding the mutex while potentially blocking briefly in osc_enter_cache(). It's still unclear to me if we can simply drastically speed up cl_io_loop() to improve this, or if we must reintroduce some range locking as was done in 1.8.

      fio           S 0000000000000008     0 33296  33207 0x00000000
       ffff880522555548 0000000000000082 ffff880522555510 ffff88052255550c
       ffff8807ff75a8c8 ffff88083fe82000 ffff880044635fc0 00000000000007ff
       ffff8807c6633af8 ffff880522555fd8 000000000000f4e8 ffff8807c6633af8
      Call Trace:
       [<ffffffffa033467e>] cfs_waitq_wait+0xe/0x10 [libcfs]
       [<ffffffffa0a3b046>] osc_enter_cache+0x676/0x9c0 [osc]
       [<ffffffffa0aa3433>] ? lov_merge_lvb_kms+0x113/0x260 [lov]
       [<ffffffff8105ea30>] ? default_wake_function+0x0/0x20
       [<ffffffffa0aafc4e>] ? lov_attr_get+0x1e/0x60 [lov]
       [<ffffffffa0a41e63>] osc_queue_async_io+0xe53/0x1430 [osc]
       [<ffffffff81274a1c>] ? put_dec+0x10c/0x110
       [<ffffffff81274d0e>] ? number+0x2ee/0x320
       [<ffffffffa0a29f3e>] ? osc_page_init+0x29e/0x5b0 [osc]
       [<ffffffff81277206>] ? vsnprintf+0x2b6/0x5f0
       [<ffffffffa0a28e24>] osc_page_cache_add+0x94/0x1a0 [osc]
       [<ffffffffa06c45bf>] cl_page_cache_add+0x7f/0x1e0 [obdclass]
       [<ffffffffa0ab2edf>] lov_page_cache_add+0x8f/0x190 [lov]
       [<ffffffffa06c45bf>] cl_page_cache_add+0x7f/0x1e0 [obdclass]
       [<ffffffff8112672d>] ? __set_page_dirty_nobuffers+0xdd/0x160
       [<ffffffffa0b78c43>] vvp_io_commit_write+0x313/0x4e0 [lustre]
       [<ffffffffa06d0d8d>] cl_io_commit_write+0xbd/0x1b0 [obdclass]
       [<ffffffffa0b5465b>] ll_commit_write+0xfb/0x320 [lustre]
       [<ffffffffa0b68660>] ll_write_end+0x30/0x60 [lustre]
       [<ffffffff81111684>] generic_file_buffered_write+0x174/0x2a0
       [<ffffffff810708d7>] ? current_fs_time+0x27/0x30
       [<ffffffff81112f70>] __generic_file_aio_write+0x250/0x480
       [<ffffffffa06c14c5>] ? cl_env_info+0x15/0x20 [obdclass]
       [<ffffffff8111320f>] generic_file_aio_write+0x6f/0xe0
       [<ffffffffa0b793a0>] vvp_io_write_start+0xb0/0x1e0 [lustre]
       [<ffffffffa06ce392>] cl_io_start+0x72/0xf0 [obdclass]
       [<ffffffffa06d1ca4>] cl_io_loop+0xb4/0x160 [obdclass]
       [<ffffffffa0b33b0e>] ll_file_io_generic+0x3be/0x4f0 [lustre]
       [<ffffffffa0b33d8c>] ll_file_aio_write+0x14c/0x290 [lustre]
       [<ffffffffa0b341e3>] ll_file_write+0x173/0x270 [lustre]
       [<ffffffff81177a98>] vfs_write+0xb8/0x1a0
       [<ffffffff811784a1>] sys_write+0x51/0x90
       [<ffffffff814f21be>] ? do_device_not_available+0xe/0x10
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      
      fio           S 0000000000000001     0 33297  33207 0x00000000
       ffff8807d4b91d18 0000000000000086 ffff8807d4b91c98 ffffffffa0aba783
       ffff880c5cca92f0 ffff880f9a8c7428 ffffffff8100bc0e ffff8807d4b91d18
       ffff8807c6633098 ffff8807d4b91fd8 000000000000f4e8 ffff8807c6633098
      Call Trace:
       [<ffffffffa0aba783>] ? lov_io_init_raid0+0x383/0x780 [lov]
       [<ffffffff8100bc0e>] ? apic_timer_interrupt+0xe/0x20
       [<ffffffff8104d93b>] ? mutex_spin_on_owner+0x9b/0xc0
       [<ffffffff814f04bb>] __mutex_lock_interruptible_slowpath+0x14b/0x1b0
       [<ffffffff814f0568>] mutex_lock_interruptible+0x48/0x50
       [<ffffffffa0b33aff>] ll_file_io_generic+0x3af/0x4f0 [lustre]
       [<ffffffffa0b33d8c>] ll_file_aio_write+0x14c/0x290 [lustre]
       [<ffffffffa0b341e3>] ll_file_write+0x173/0x270 [lustre]
       [<ffffffff81177a98>] vfs_write+0xb8/0x1a0
       [<ffffffff811784a1>] sys_write+0x51/0x90
       [<ffffffff814f21be>] ? do_device_not_available+0xe/0x10
       [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
      ...many threads like 33297...
      

      Finally, it's my understanding that this lock is solely here to enforce atomicity between threads on the client. If that is in fact the case we may do a little experimenting with the group lock to see if that's a reasonable short term workaround for some of our codes.

      Attachments

        Issue Links

          Activity

            People

              laisiyao Lai Siyao
              behlendorf Brian Behlendorf
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: