[LU-8821] double find in mdt_path_current() Created: 10/Nov/16 Updated: 23/Mar/19 Resolved: 24/Jan/17 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.10.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hsm, mdt | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
In mdt_path_current() we find an object while holding a reference to it causing a potential deadlock in lu_object_find_at(): static int mdt_path_current(struct mdt_thread_info *info, struct mdt_object *obj, struct getinfo_fid2path *fp) { struct mdt_device *mdt = info->mti_mdt; struct mdt_object *mdt_obj; struct link_ea_header *leh; struct link_ea_entry *lee; struct lu_name *tmpname = &info->mti_name; struct lu_fid *tmpfid = &info->mti_tmp_fid1; struct lu_buf *buf = &info->mti_big_buf; char *ptr; int reclen; struct linkea_data ldata = { NULL }; int rc = 0; bool first = true; ENTRY; /* temp buffer for path element, the buffer will be finally freed * in mdt_thread_info_fini */ buf = lu_buf_check_and_alloc(buf, PATH_MAX); if (buf->lb_buf == NULL) RETURN(-ENOMEM); ldata.ld_buf = buf; ptr = fp->gf_path + fp->gf_pathlen - 1; *ptr = 0; --ptr; *tmpfid = fp->gf_fid = *mdt_object_fid(obj); /* root FID only exists on MDT0, and fid2path should also ends at MDT0, * so checking root_fid can only happen on MDT0. */ while (!lu_fid_eq(&mdt->mdt_md_root_fid, &fp->gf_fid)) { struct lu_buf lmv_buf; mdt_obj = mdt_object_find(info->mti_env, mdt, tmpfid); ... One way to see a hang from this is to enable HSM and do: # cd /mnt/lustre
# while true; do
echo XXX > f0
lfs hsm_archive f0
sys_unlink f0
done
Note that in the archive path the CT uses the fid2path ioctl for debug messages. In restore it uses the fid2path ioctl to get the parent directory of the file to be restored when creating the volatile file. |
| Comments |
| Comment by Gerrit Updater [ 10/Nov/16 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/23701 |
| Comment by Robert Read (Inactive) [ 10/Nov/16 ] |
|
FWIW, Lemur doesn't use fid2path in archive path (we just print FIDs in debug messages), and it is liblustreapi_hsm.c that is using fid2path in restore path, so out of our control currently. I wonder if we can avoid the fid2path in restore by using the parent fid from the lsm xattr and then openat() to create the recovery file. |
| Comment by Gerrit Updater [ 24/Jan/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23701/ |
| Comment by Peter Jones [ 24/Jan/17 ] |
|
Landed for 2.10 |