I investigated this on my home system, since I was seeing this problem intermittently as well.
It looks like the problem is caused in my case because the problematic files were created with Lustre 1.8 or earlier and have IGIF FIDs (these show as inode number < 4B, ). Migrating files created under 2.x is OK (these have very large inode numbers):
This is because there is a check in mdd_swap_layouts_allowed() that prevents layout swap for IGIF FIDs:
static int mdd_layout_swap_allowed(const struct lu_env *env,
struct mdd_object *o1,
struct mdd_object *o2)
{
const struct lu_fid *fid1, *fid2;
fid1 = mdo2fid(o1);
fid2 = mdo2fid(o2);
if (!fid_is_norm(fid1) || !fid_is_norm(fid2) ||
(mdd_object_type(o1) != mdd_object_type(o2)))
RETURN(-EPERM);
This was done to prevent clients from being able to swap the contents of regular files with internal system files by using their (internal) IGIF FIDs.
I'm not yet sure how to detect the difference between filenames that are visible in the namespace and ones that are accessed by IGIF FID. It does seem that "lfs fid2path" and $MOUNT/.lustre/fid/ can detect the difference between IGIF and FID access with my 2.4.1 server. I took a normal FID and then figured out its IGIF FID by looking at the MDT inode's inum/generation directly, and got a "no such file or directory", so that is a good start.
However, it also appears that some files in the MDT root directory (e.g. backups of fld, seq_srv, seq_cli, etc) are readable via $MOUNT/.lustre/fid/ and have an IGIF FID assigned to them. The shell also thinks that these FIDs have write permission (i.e. test -w "$MOUNT/.lustre/fid/[0x2686:0xc40fa169:0x0]" returns 0), even though I get a permission denied error trying to modify them, so normal write permission checks will fail. That might be a problem with LFSCK adding these files into the OI when they shouldn't be. The originals of these files correctly have SEQ 0x200000001 and get an error from obf_lookup(), but I think it makes sense to mark all files in the top-level MDT/OST root directory inaccessible, and only add files under ROOT to the OI.
In the short term, "lfs_migrate" should fall back to using rsync internally if "lfs migrate" returns an error, but I haven't tested this. It would also be useful to fix the error message printed by "lfs migrate", since I find the current one confusing. I don't think it needs to mention anything about volatile files.
Sorry to be late, but I am back on this one.
Andreas, sorry to ask but can you explain me how the files created in MDT root-directory have an IGIF assigned ??
I also confirm that as part of
LU-3834, and fault-injection during layouts-swap to verify patch behavior, I reproduce the volatile object leak (inode links number is 1 and e2fsck detects "Unattached inode") on MDT. In my case, and for one layouts-swap forced error, I see one orphan inode with ".^L^S^T^R:VOLATILE"/LUSTRE_VOLATILE_HDR linkEA but also one with "i_am_nobody", did you also find this ?But anyway, this clearly indicate that there is something to address and fix upon layouts-swap error.