Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8821

double find in mdt_path_current()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      In mdt_path_current() we find an object while holding a reference to it causing a potential deadlock in lu_object_find_at():

      static int mdt_path_current(struct mdt_thread_info *info,
                                  struct mdt_object *obj,
                                  struct getinfo_fid2path *fp)
      {
              struct mdt_device       *mdt = info->mti_mdt;
              struct mdt_object       *mdt_obj;
              struct link_ea_header   *leh;
              struct link_ea_entry    *lee;
              struct lu_name          *tmpname = &info->mti_name;
              struct lu_fid           *tmpfid = &info->mti_tmp_fid1;
              struct lu_buf           *buf = &info->mti_big_buf;
              char                    *ptr;
              int                     reclen;
              struct linkea_data      ldata = { NULL };
              int                     rc = 0;
              bool                    first = true;
              ENTRY;
      
              /* temp buffer for path element, the buffer will be finally freed
               * in mdt_thread_info_fini */
              buf = lu_buf_check_and_alloc(buf, PATH_MAX);
              if (buf->lb_buf == NULL)
                      RETURN(-ENOMEM);
      
              ldata.ld_buf = buf;
              ptr = fp->gf_path + fp->gf_pathlen - 1;
              *ptr = 0;
              --ptr;
              *tmpfid = fp->gf_fid = *mdt_object_fid(obj);
      
              /* root FID only exists on MDT0, and fid2path should also ends at MDT0,
               * so checking root_fid can only happen on MDT0. */
              while (!lu_fid_eq(&mdt->mdt_md_root_fid, &fp->gf_fid)) {
                      struct lu_buf           lmv_buf;
      
                      mdt_obj = mdt_object_find(info->mti_env, mdt, tmpfid);
                      ...
      

      One way to see a hang from this is to enable HSM and do:

      # cd /mnt/lustre
      # while true; do
          echo XXX > f0
          lfs hsm_archive f0
          sys_unlink f0
      done
      

      Note that in the archive path the CT uses the fid2path ioctl for debug messages. In restore it uses the fid2path ioctl to get the parent directory of the file to be restored when creating the volatile file.

      Attachments

        Issue Links

          Activity

            People

              jhammond John Hammond
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: