Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7861

MDS Contention during unlinks due to llog spinlock

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.9.0
    • Lustre 2.5.5
    • None
    • 2.5.5-g1241c21-CHANGED-2.6.32-573.12.1.el6.atlas.x86_64
    • 3
    • 9223372036854775807

    Description

      We see intermittent periods of interactive "slowness" on our production systems. Looking on the MDS, load can go quite high. Running perf we see that osp_sync_add_rec is chewing a lot of time in a spin_lock. I believe this is serially adding the unlink records to the llog so the OST objects will be removed.

      -   29.60%    29.60%  [kernel]                   [k] _spin_lock        ▒
         - _spin_lock                                                        ▒
            - 77.51% osp_sync_add_rec                                        ▒
                 osp_sync_add                                                ▒
            + 7.79% task_rq_lock                                             ▒
            + 6.68% try_to_wake_up                                           ▒
            + 1.32% osp_statfs                                               ▒
            + 1.26% kmem_cache_free                                          ▒
            + 1.12% cfs_percpt_lock
      

      I used jobstats to confirm that we had at least two jobs doing a significant number of unlinks at the time. When multiple MDS threads attempt to do unlinks they serialize, but they spin and block the CPUs in the meantime.

      I believe the following code is responsible:

      osp_sync.c:421
                      spin_lock(&d->opd_syn_lock);
                      d->opd_syn_changes++;
                      spin_unlock(&d->opd_syn_lock);
      

      How can we improve this situation:

      • Is the spin_lock here just to protect opd_syn_changes (so it could be changed to an atomic) or does it enforce additional synchronization? Would a mutex be appropriate here, or would the context switches kill us in a different way?
      • Does it make sense to support multiple llogs per device and hash objects to the different llogs so they can be appended to in parallel? Are there assumptions of ordering for llogs?
      • Something else?

      Attachments

        Activity

          People

            bzzz Alex Zhuravlev
            ezell Matt Ezell
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: