Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4471

Failed 'rmdir' on remote directories still removes directory on MDT0

Details

    • 3
    • 12248

    Description

      When doing an rmdir on a 'first level' remote directory, IE, a directory on MDT1 which is in a directory on MDT0, the directory entry on MDT0 is removed before the sanity checking is done.

      To reproduce (/lus/TEMP is a directory on MDT0):

      /lus/TEMP # mkdir mdt0
      /lus/TEMP # lfs mkdir -i 1 mdt1
      /lus/TEMP # touch mdt1/file
      /lus/TEMP # ls
      mdt0  mdt1
      /lus/TEMP # ls mdt1
      1
      /lus/TEMP # rmdir mdt1
      rmdir: failed to remove `mdt1': Directory not empty
      /lus/TEMP #  ls
      mdt0
      

      As you can see, rmdir returns with an error saying it failed to remove the directory mdt1, but the director no longer exists on MDT0.

      Looking at mdt_reint_unlink (which is executing on MDT1), it's easy to see why.
      When a first level remote directory is found, the delete RPC is sent to MDT0 before the sanity checking on MDT1 is done.

      if (mdt_object_remote(mc)) {
                      struct mdt_body  *repbody;
      
                      if (!fid_is_zero(rr->rr_fid2)) {
                              CDEBUG(D_INFO, "%s: name "DNAME" cannot find "DFID"\n",
                                     mdt_obd_name(info->mti_mdt),
                                     PNAME(&rr->rr_name), PFID(mdt_object_fid(mc)));
                              GOTO(put_child, rc = -ENOENT);
                      }
                      CDEBUG(D_INFO, "%s: name "DNAME": "DFID" is on another MDT\n",
                             mdt_obd_name(info->mti_mdt),
                             PNAME(&rr->rr_name), PFID(mdt_object_fid(mc)));
      
                      if (!mdt_is_dne_client(req->rq_export))
                              /* Return -EIO for old client */
                              GOTO(put_child, rc = -EIO);
      
                      if (info->mti_spec.sp_rm_entry) {
                              struct lu_ucred *uc  = mdt_ucred(info);
      
                              if (!md_capable(uc, CFS_CAP_SYS_ADMIN)) {
                                      CERROR("%s: unlink remote entry is only "
                                             "permitted for administrator: rc = %d\n",
                                              mdt_obd_name(info->mti_mdt),
                                              -EPERM);
                                      GOTO(put_child, rc = -EPERM);
                              }
      
                              ma->ma_need = MA_INODE;
                              ma->ma_valid = 0;
                              mdt_set_capainfo(info, 1, child_fid, BYPASS_CAPA);
                              rc = mdo_unlink(info->mti_env, mdt_object_child(mp),
                                              NULL, &rr->rr_name, ma, no_name);
                              GOTO(put_child, rc);
                      }
      

      Followed shortly after by this:

              mutex_lock(&mc->mot_lov_mutex);
      
              rc = mdo_unlink(info->mti_env, mdt_object_child(mp),
                              mdt_object_child(mc), &rr->rr_name, ma, no_name);
      
              mutex_unlock(&mc->mot_lov_mutex);
      

      It is mdo_unlink that returns the -39 (ENOTEMPTY) back to the client, because it calls mdd_unlink_sanity_check (which calls mdd_dir_is_empty).

      I have logs from both MDTs and the client of an rmdir on MDT0 failing as expected, and an rmdir on MDT1 showing the unusual behavior described. I'll attach those shortly.

      Attachments

        Issue Links

          Activity

            [LU-4471] Failed 'rmdir' on remote directories still removes directory on MDT0
            pjones Peter Jones added a comment -

            Landed for 2.6

            pjones Peter Jones added a comment - Landed for 2.6
            bogl Bob Glossman (Inactive) added a comment - backport to b2_5: http://review.whamcloud.com/9379

            I have restarted testing, sorry about that. Will monitor.

            cliffw Cliff White (Inactive) added a comment - I have restarted testing, sorry about that. Will monitor.

            Cliff - Not really. This one's in Intel's hands now. My patch, I think, satisfies Andreas and Fan Yong, and Cray has been using it internally for a bit now with great success. The Maloo tests failed for a reason unrelated to my patch, and need to be re-run.

            I can't really do anything more, as the patch is good - Someone at Intel can restart the Maloo testing, and hopefully it won't hit any other issues.

            paf Patrick Farrell (Inactive) added a comment - Cliff - Not really. This one's in Intel's hands now. My patch, I think, satisfies Andreas and Fan Yong, and Cray has been using it internally for a bit now with great success. The Maloo tests failed for a reason unrelated to my patch, and need to be re-run. I can't really do anything more, as the patch is good - Someone at Intel can restart the Maloo testing, and hopefully it won't hit any other issues.

            Patrick, any updates on this issue?

            cliffw Cliff White (Inactive) added a comment - Patrick, any updates on this issue?

            Cliff - Most definitely. Barring the unexpected, I'll get to fixing it up this afternoon.

            paf Patrick Farrell (Inactive) added a comment - Cliff - Most definitely. Barring the unexpected, I'll get to fixing it up this afternoon.

            Patrick, it appears that you patch has failed some reviews, are you able to address the comments?

            cliffw Cliff White (Inactive) added a comment - Patrick, it appears that you patch has failed some reviews, are you able to address the comments?

            Cliff,
            Are you able to comment on your monitoring of this ticket yet?
            Thank you!

            jlevi Jodi Levi (Inactive) added a comment - Cliff, Are you able to comment on your monitoring of this ticket yet? Thank you!

            Thanks, will monitor.

            cliffw Cliff White (Inactive) added a comment - Thanks, will monitor.

            Patch here:
            http://review.whamcloud.com/8827

            Confirmed it fixes the issue in question.

            paf Patrick Farrell (Inactive) added a comment - Patch here: http://review.whamcloud.com/8827 Confirmed it fixes the issue in question.

            People

              cliffw Cliff White (Inactive)
              paf Patrick Farrell (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: