-
Bug
-
Resolution: Fixed
-
Blocker
-
Lustre 2.5.0
-
3
-
12248
When doing an rmdir on a 'first level' remote directory, IE, a directory on MDT1 which is in a directory on MDT0, the directory entry on MDT0 is removed before the sanity checking is done.
To reproduce (/lus/TEMP is a directory on MDT0):
/lus/TEMP # mkdir mdt0 /lus/TEMP # lfs mkdir -i 1 mdt1 /lus/TEMP # touch mdt1/file /lus/TEMP # ls mdt0 mdt1 /lus/TEMP # ls mdt1 1 /lus/TEMP # rmdir mdt1 rmdir: failed to remove `mdt1': Directory not empty /lus/TEMP # ls mdt0
As you can see, rmdir returns with an error saying it failed to remove the directory mdt1, but the director no longer exists on MDT0.
Looking at mdt_reint_unlink (which is executing on MDT1), it's easy to see why.
When a first level remote directory is found, the delete RPC is sent to MDT0 before the sanity checking on MDT1 is done.
if (mdt_object_remote(mc)) { struct mdt_body *repbody; if (!fid_is_zero(rr->rr_fid2)) { CDEBUG(D_INFO, "%s: name "DNAME" cannot find "DFID"\n", mdt_obd_name(info->mti_mdt), PNAME(&rr->rr_name), PFID(mdt_object_fid(mc))); GOTO(put_child, rc = -ENOENT); } CDEBUG(D_INFO, "%s: name "DNAME": "DFID" is on another MDT\n", mdt_obd_name(info->mti_mdt), PNAME(&rr->rr_name), PFID(mdt_object_fid(mc))); if (!mdt_is_dne_client(req->rq_export)) /* Return -EIO for old client */ GOTO(put_child, rc = -EIO); if (info->mti_spec.sp_rm_entry) { struct lu_ucred *uc = mdt_ucred(info); if (!md_capable(uc, CFS_CAP_SYS_ADMIN)) { CERROR("%s: unlink remote entry is only " "permitted for administrator: rc = %d\n", mdt_obd_name(info->mti_mdt), -EPERM); GOTO(put_child, rc = -EPERM); } ma->ma_need = MA_INODE; ma->ma_valid = 0; mdt_set_capainfo(info, 1, child_fid, BYPASS_CAPA); rc = mdo_unlink(info->mti_env, mdt_object_child(mp), NULL, &rr->rr_name, ma, no_name); GOTO(put_child, rc); }
Followed shortly after by this:
mutex_lock(&mc->mot_lov_mutex);
rc = mdo_unlink(info->mti_env, mdt_object_child(mp),
mdt_object_child(mc), &rr->rr_name, ma, no_name);
mutex_unlock(&mc->mot_lov_mutex);
It is mdo_unlink that returns the -39 (ENOTEMPTY) back to the client, because it calls mdd_unlink_sanity_check (which calls mdd_dir_is_empty).
I have logs from both MDTs and the client of an rmdir on MDT0 failing as expected, and an rmdir on MDT1 showing the unusual behavior described. I'll attach those shortly.
- is related to
-
LU-4690 sanity test_4: Expect error removing in-use dir /mnt/lustre/remote_dir
-
- Resolved
-