Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
3
-
12248
Description
When doing an rmdir on a 'first level' remote directory, IE, a directory on MDT1 which is in a directory on MDT0, the directory entry on MDT0 is removed before the sanity checking is done.
To reproduce (/lus/TEMP is a directory on MDT0):
/lus/TEMP # mkdir mdt0 /lus/TEMP # lfs mkdir -i 1 mdt1 /lus/TEMP # touch mdt1/file /lus/TEMP # ls mdt0 mdt1 /lus/TEMP # ls mdt1 1 /lus/TEMP # rmdir mdt1 rmdir: failed to remove `mdt1': Directory not empty /lus/TEMP # ls mdt0
As you can see, rmdir returns with an error saying it failed to remove the directory mdt1, but the director no longer exists on MDT0.
Looking at mdt_reint_unlink (which is executing on MDT1), it's easy to see why.
When a first level remote directory is found, the delete RPC is sent to MDT0 before the sanity checking on MDT1 is done.
if (mdt_object_remote(mc)) { struct mdt_body *repbody; if (!fid_is_zero(rr->rr_fid2)) { CDEBUG(D_INFO, "%s: name "DNAME" cannot find "DFID"\n", mdt_obd_name(info->mti_mdt), PNAME(&rr->rr_name), PFID(mdt_object_fid(mc))); GOTO(put_child, rc = -ENOENT); } CDEBUG(D_INFO, "%s: name "DNAME": "DFID" is on another MDT\n", mdt_obd_name(info->mti_mdt), PNAME(&rr->rr_name), PFID(mdt_object_fid(mc))); if (!mdt_is_dne_client(req->rq_export)) /* Return -EIO for old client */ GOTO(put_child, rc = -EIO); if (info->mti_spec.sp_rm_entry) { struct lu_ucred *uc = mdt_ucred(info); if (!md_capable(uc, CFS_CAP_SYS_ADMIN)) { CERROR("%s: unlink remote entry is only " "permitted for administrator: rc = %d\n", mdt_obd_name(info->mti_mdt), -EPERM); GOTO(put_child, rc = -EPERM); } ma->ma_need = MA_INODE; ma->ma_valid = 0; mdt_set_capainfo(info, 1, child_fid, BYPASS_CAPA); rc = mdo_unlink(info->mti_env, mdt_object_child(mp), NULL, &rr->rr_name, ma, no_name); GOTO(put_child, rc); }
Followed shortly after by this:
mutex_lock(&mc->mot_lov_mutex); rc = mdo_unlink(info->mti_env, mdt_object_child(mp), mdt_object_child(mc), &rr->rr_name, ma, no_name); mutex_unlock(&mc->mot_lov_mutex);
It is mdo_unlink that returns the -39 (ENOTEMPTY) back to the client, because it calls mdd_unlink_sanity_check (which calls mdd_dir_is_empty).
I have logs from both MDTs and the client of an rmdir on MDT0 failing as expected, and an rmdir on MDT1 showing the unusual behavior described. I'll attach those shortly.
Attachments
Issue Links
- is related to
-
LU-4690 sanity test_4: Expect error removing in-use dir /mnt/lustre/remote_dir
- Resolved