Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-903

Race condition while get_attr after cancel_lru_locks and sysctl drop_caches

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • Lustre 2.4.0, Lustre 1.8.6
    • SLES11
    • 3
    • 24,555
    • 5140

    Description

      Reproduction script is described in https://bugzilla.lustre.org/show_bug.cgi?id=24555
      After some analysis the next bug picture drown:

      1.
      First thread makes lookup. Gets CR lock and terminates.
      After that another thread a) make clear_lru cache b) sysctl that flush slab and another kernel
      caches (dcache, icache, etc ...)

      This results that sequence "shrink_dcache_memory -> "foreach_dentry_lru" -> prune_one_dentry ->
      d_kill > d_iput()" is executed. After that ll_clear_inode executed that NULLed lock>l_ast_data.

      2.
      Some time after 1. another thread make get_attr on same inode. Gets another IBIT lock, but LOOKUP +
      UPDATE.
      Another client need cancel this lock, but 2 BL AST race arised. Second lock can't cancel first
      lock because optimization, that was sown bellow and fist lock can't be canceled because its
      inode == NULL.

      Optimisation:

      int ll_mdc_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
                              void *data, int flag)
      ...
                     if ((bits & MDS_INODELOCK_LOOKUP) &&
                          ll_have_md_lock(inode, MDS_INODELOCK_LOOKUP, LCK_MINMODE))
                              bits &= ~MDS_INODELOCK_LOOKUP;
                      if ((bits & MDS_INODELOCK_UPDATE) &&
                          ll_have_md_lock(inode, MDS_INODELOCK_UPDATE, LCK_MINMODE))
                              bits &= ~MDS_INODELOCK_UPDATE;
                      if ((bits & MDS_INODELOCK_OPEN) &&
                          ll_have_md_lock(inode, MDS_INODELOCK_OPEN, mode))
                              bits &= ~MDS_INODELOCK_OPEN;
      ...
      if (inode->i_sb->s_root &&
              inode != inode->i_sb->s_root->d_inode &&
              (bits & MDS_INODELOCK_LOOKUP))
              ll_unhash_aliases(inode);
      iput(inode);
      

      Attachments

        Issue Links

          Activity

            [LU-903] Race condition while get_attr after cancel_lru_locks and sysctl drop_caches
            pjones Peter Jones added a comment -

            Landed for 2.4

            pjones Peter Jones added a comment - Landed for 2.4
            spitzcor Cory Spitz added a comment -

            FYI, Cray has been using this patch for nearly a year.

            spitzcor Cory Spitz added a comment - FYI, Cray has been using this patch for nearly a year.

            Xyratex-bug-id: MRP-269
            Xyratex-bug-id: MRP-363

            nrutman Nathan Rutman added a comment - Xyratex-bug-id: MRP-269 Xyratex-bug-id: MRP-363

            I have reviewed the latest version of the patch.

            keith Keith Mannthey (Inactive) added a comment - I have reviewed the latest version of the patch.

            The patch is reviewed. Is waiting for landing.

            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - The patch is reviewed. Is waiting for landing.
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - Patch for master is uploaded http://review.whamcloud.com/2627

            I'm all in favour of moving the inode reference onto the resource instead of on the lock. This patch should be landed to master first, not b1_8. Please submit a version of this patch against the master branch.

            adilger Andreas Dilger added a comment - I'm all in favour of moving the inode reference onto the resource instead of on the lock. This patch should be landed to master first, not b1_8. Please submit a version of this patch against the master branch.

            People

              keith Keith Mannthey (Inactive)
              artem_blagodarenko Artem Blagodarenko (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: