[LU-8821] double find in mdt_path_current() Created: 10/Nov/16  Updated: 23/Mar/19  Resolved: 24/Jan/17

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.10.0

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: John Hammond
Resolution: Fixed Votes: 0
Labels: hsm, mdt

Issue Links:
Duplicate
is duplicated by LU-11970 Using changelog reader causes fid2pat... Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

In mdt_path_current() we find an object while holding a reference to it causing a potential deadlock in lu_object_find_at():

static int mdt_path_current(struct mdt_thread_info *info,
                            struct mdt_object *obj,
                            struct getinfo_fid2path *fp)
{
        struct mdt_device       *mdt = info->mti_mdt;
        struct mdt_object       *mdt_obj;
        struct link_ea_header   *leh;
        struct link_ea_entry    *lee;
        struct lu_name          *tmpname = &info->mti_name;
        struct lu_fid           *tmpfid = &info->mti_tmp_fid1;
        struct lu_buf           *buf = &info->mti_big_buf;
        char                    *ptr;
        int                     reclen;
        struct linkea_data      ldata = { NULL };
        int                     rc = 0;
        bool                    first = true;
        ENTRY;

        /* temp buffer for path element, the buffer will be finally freed
         * in mdt_thread_info_fini */
        buf = lu_buf_check_and_alloc(buf, PATH_MAX);
        if (buf->lb_buf == NULL)
                RETURN(-ENOMEM);

        ldata.ld_buf = buf;
        ptr = fp->gf_path + fp->gf_pathlen - 1;
        *ptr = 0;
        --ptr;
        *tmpfid = fp->gf_fid = *mdt_object_fid(obj);

        /* root FID only exists on MDT0, and fid2path should also ends at MDT0,
         * so checking root_fid can only happen on MDT0. */
        while (!lu_fid_eq(&mdt->mdt_md_root_fid, &fp->gf_fid)) {
                struct lu_buf           lmv_buf;

                mdt_obj = mdt_object_find(info->mti_env, mdt, tmpfid);
                ...

One way to see a hang from this is to enable HSM and do:

# cd /mnt/lustre
# while true; do
    echo XXX > f0
    lfs hsm_archive f0
    sys_unlink f0
done

Note that in the archive path the CT uses the fid2path ioctl for debug messages. In restore it uses the fid2path ioctl to get the parent directory of the file to be restored when creating the volatile file.



 Comments   
Comment by Gerrit Updater [ 10/Nov/16 ]

John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/23701
Subject: LU-8821 mdt: avoid double find in mdt_path_current()
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 302192da25bc9e53735a5f81895af862eba78ddf

Comment by Robert Read (Inactive) [ 10/Nov/16 ]

FWIW, Lemur doesn't use fid2path in archive path (we just print FIDs in debug messages), and it is liblustreapi_hsm.c that is using fid2path in restore path, so out of our control currently.

I wonder if we can avoid the fid2path in restore by using the parent fid from the lsm xattr and then openat() to create the recovery file.

Comment by Gerrit Updater [ 24/Jan/17 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/23701/
Subject: LU-8821 mdt: avoid double find in mdt_path_current()
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c6383473e74262eaf8f822dcb6b28b22b130f364

Comment by Peter Jones [ 24/Jan/17 ]

Landed for 2.10

Generated at Sat Feb 10 02:20:50 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.