Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4912

ldlm_cancel_locks_for_export() causes IO during umount

Details

    • Question/Request
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.4.3

    Description

      After observing substantial read I/Os on our systems during an OST umount I took a look at exactly what was causing them. They root cause turns out to be ldlm_cancel_locks_for_export() is calling ofd_lvbo_update() to update the LVB from disk for every lock as its canceled. When there are millions of locks on the server this translates in to a huge amount of IO.

      After reading through the code it's not at all clear to me why this is done. How could it be out of date? Why is this required before the lock can be canceled?

      [<ffffffffa031c9dc>] cv_wait_common+0x8c/0x100 [spl]
      [<ffffffffa031ca68>] __cv_wait_io+0x18/0x20 [spl]
      [<ffffffffa046353b>] zio_wait+0xfb/0x1b0 [zfs]
      [<ffffffffa03d16bd>] dbuf_read+0x3fd/0x740 [zfs]
      [<ffffffffa03d1b89>] __dbuf_hold_impl+0x189/0x480 [zfs]
      [<ffffffffa03d1f06>] dbuf_hold_impl+0x86/0xc0 [zfs]
      [<ffffffffa03d2f80>] dbuf_hold+0x20/0x30 [zfs]
      [<ffffffffa03d9767>] dmu_buf_hold+0x97/0x1d0 [zfs]
      [<ffffffffa042de8f>] zap_get_leaf_byblk+0x4f/0x2a0 [zfs]
      [<ffffffffa042e14a>] zap_deref_leaf+0x6a/0x80 [zfs]
      [<ffffffffa042e510>] fzap_lookup+0x60/0x120 [zfs]
      [<ffffffffa0433f11>] zap_lookup_norm+0xe1/0x190 [zfs]
      [<ffffffffa0434053>] zap_lookup+0x33/0x40 [zfs]
      [<ffffffffa0cf0710>] osd_fid_lookup+0xb0/0x2e0 [osd_zfs]
      [<ffffffffa0cea311>] osd_object_init+0x1a1/0x6d0 [osd_zfs]
      [<ffffffffa06efc9d>] lu_object_alloc+0xcd/0x300 [obdclass]
      [<ffffffffa06f0805>] lu_object_find_at+0x205/0x360 [obdclass]
      [<ffffffffa06f0976>] lu_object_find+0x16/0x20 [obdclass]
      [<ffffffffa0d80575>] ofd_object_find+0x35/0xf0 [ofd]
      [<ffffffffa0d90486>] ofd_lvbo_update+0x366/0xdac [ofd]
      [<ffffffffa0831828>] ldlm_cancel_locks_for_export_cb+0x88/0x200 [ptlrpc]
      [<ffffffffa059178f>] cfs_hash_for_each_relax+0x17f/0x360 [libcfs]
      [<ffffffffa0592fde>] cfs_hash_for_each_empty+0xfe/0x1e0 [libcfs]
      [<ffffffffa082c05f>] ldlm_cancel_locks_for_export+0x2f/0x40 [ptlrpc]
      [<ffffffffa083b804>] server_disconnect_export+0x64/0x1a0 [ptlrpc]
      [<ffffffffa0d717fa>] ofd_obd_disconnect+0x6a/0x1f0 [ofd]
      [<ffffffffa06b5d77>] class_disconnect_export_list+0x337/0x660 [obdclass]
      [<ffffffffa06b6496>] class_disconnect_exports+0x116/0x2f0 [obdclass]
      [<ffffffffa06de9cf>] class_cleanup+0x16f/0xda0 [obdclass]
      [<ffffffffa06e06bc>] class_process_config+0x10bc/0x1c80 [obdclass]
      [<ffffffffa06e13f9>] class_manual_cleanup+0x179/0x6f0 [obdclass]
      [<ffffffffa071615c>] server_put_super+0x5bc/0xf00 [obdclass]
      [<ffffffff8118461b>] generic_shutdown_super+0x5b/0xe0
      [<ffffffff81184706>] kill_anon_super+0x16/0x60
      [<ffffffffa06e3256>] lustre_kill_super+0x36/0x60 [obdclass]
      [<ffffffff81184ea7>] deactivate_super+0x57/0x80
      [<ffffffff811a2d2f>] mntput_no_expire+0xbf/0x110
      [<ffffffff811a379b>] sys_umount+0x7b/0x3a0
      

      Attachments

        Activity

          [LU-4912] ldlm_cancel_locks_for_export() causes IO during umount
          pjones Peter Jones made changes -
          Fix Version/s Original: Lustre 2.13.0 [ 14290 ]
          adilger Andreas Dilger made changes -
          Fix Version/s New: Lustre 2.13.0 [ 14290 ]
          Fix Version/s Original: Lustre 2.12.0 [ 13495 ]
          tappro Mikhail Pershin made changes -
          Fix Version/s New: Lustre 2.12.0 [ 13495 ]
          Fix Version/s Original: Lustre 2.11.0 [ 13091 ]
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.11.0 [ 13091 ]
          Fix Version/s Original: Lustre 2.10.0 [ 12204 ]
          pjones Peter Jones made changes -
          Fix Version/s New: Lustre 2.10.0 [ 12204 ]
          Fix Version/s Original: Lustre 2.9.0 [ 11891 ]
          pjones Peter Jones made changes -
          End date New: 23/Dec/15
          Start date New: 15/Apr/14
          adilger Andreas Dilger made changes -
          Fix Version/s New: Lustre 2.9.0 [ 11891 ]
          adilger Andreas Dilger made changes -
          Assignee Original: Oleg Drokin [ green ] New: Mikhail Pershin [ tappro ]
          morrone Christopher Morrone (Inactive) made changes -
          Labels New: llnl

          What might make sense in the lock cancellation in case of client eviction case is to mark the LVB stale in the resource, and then it will be refreshed from disk only if the lock is used again. That would avoid the need to update the LVB repeatedly during cancellation of many locks, and avoids any work if the resource is never used again.

          I also notice the comment in ldlm_glimpse_ast() implies that "filter_intent_policy()" is handling this, but the new ofd_intent_policy() uses ldlm_glimpse_locks() which does not appear to call ldlm_res_lvbo_update(res, NULL, 1) if the glimpse fails.

          /**
           * ->l_glimpse_ast() for DLM extent locks acquired on the server-side. See
           * comment in filter_intent_policy() on why you may need this.
           */
          int ldlm_glimpse_ast(struct ldlm_lock *lock, void *reqp)
          {
                  /*
                   * Returning -ELDLM_NO_LOCK_DATA actually works, but the reason for
                   * that is rather subtle: with OST-side locking, it may so happen that
                   * _all_ extent locks are held by the OST. If client wants to obtain
                   * current file size it calls ll{,u}_glimpse_size(), and (as locks are
                   * on the server), dummy glimpse callback fires and does
                   * nothing. Client still receives correct file size due to the
                   * following fragment in filter_intent_policy():
                   *
                   * rc = l->l_glimpse_ast(l, NULL); // this will update the LVB
                   * if (rc != 0 && res->lr_namespace->ns_lvbo &&
                   *     res->lr_namespace->ns_lvbo->lvbo_update) {
                   *         res->lr_namespace->ns_lvbo->lvbo_update(res, NULL, 0, 1);
                   * }
                   *
                   * that is, after glimpse_ast() fails, filter_lvbo_update() runs, and
                   * returns correct file size to the client.
                   */
                  return -ELDLM_NO_LOCK_DATA;
          }
          

          So it looks like there are a few improvements that could be done:

          • replace comments mentioning filter_intent_policy() with ofd_intent_policy()
          • change ldlm_res_lvbo_update() to a new ldlm_res_lvbo_invalidate() during client eviction (should mark the resource LVB stale)
          • update the LVB from disk only if it is marked stale
          adilger Andreas Dilger added a comment - What might make sense in the lock cancellation in case of client eviction case is to mark the LVB stale in the resource, and then it will be refreshed from disk only if the lock is used again. That would avoid the need to update the LVB repeatedly during cancellation of many locks, and avoids any work if the resource is never used again. I also notice the comment in ldlm_glimpse_ast() implies that "filter_intent_policy()" is handling this, but the new ofd_intent_policy() uses ldlm_glimpse_locks() which does not appear to call ldlm_res_lvbo_update(res, NULL, 1) if the glimpse fails. /** * ->l_glimpse_ast() for DLM extent locks acquired on the server-side. See * comment in filter_intent_policy() on why you may need this . */ int ldlm_glimpse_ast(struct ldlm_lock *lock, void *reqp) { /* * Returning -ELDLM_NO_LOCK_DATA actually works, but the reason for * that is rather subtle: with OST-side locking, it may so happen that * _all_ extent locks are held by the OST. If client wants to obtain * current file size it calls ll{,u}_glimpse_size(), and (as locks are * on the server), dummy glimpse callback fires and does * nothing. Client still receives correct file size due to the * following fragment in filter_intent_policy(): * * rc = l->l_glimpse_ast(l, NULL); // this will update the LVB * if (rc != 0 && res->lr_namespace->ns_lvbo && * res->lr_namespace->ns_lvbo->lvbo_update) { * res->lr_namespace->ns_lvbo->lvbo_update(res, NULL, 0, 1); * } * * that is, after glimpse_ast() fails, filter_lvbo_update() runs, and * returns correct file size to the client. */ return -ELDLM_NO_LOCK_DATA; } So it looks like there are a few improvements that could be done: replace comments mentioning filter_intent_policy() with ofd_intent_policy() change ldlm_res_lvbo_update() to a new ldlm_res_lvbo_invalidate() during client eviction (should mark the resource LVB stale) update the LVB from disk only if it is marked stale

          People

            tappro Mikhail Pershin
            behlendorf Brian Behlendorf
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated: