[LU-838] "lfs path2fid /mnt/lustre" (ROOT) returns inode number Created: 12/Nov/11  Updated: 27/Feb/13  Resolved: 21/Feb/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: None

Type: Bug Priority: Blocker
Reporter: Andreas Dilger Assignee: Mikhail Pershin
Resolution: Fixed Votes: 0
Labels: LB

Issue Links:
Duplicate
is duplicated by LU-2240 implement index range lookup for osd-... Resolved
is duplicated by LU-1550 open-by-fid getdents may return incon... Resolved
Related
is related to LU-2462 debugfs doesn't care about DIRENT_LUF... Resolved
is related to LU-2886 create local files using local_storag... Resolved
Severity: 3
Rank (Obsolete): 3991

 Description   

Running "lfs path2fid" on the Lustre mountpoint (e.g. /mnt/lustre) will return the underlying "ROOT/" directory inode number. This is bad for a number of reasons:

  • it exposes the on-disk inode number to userspace as an IGIF value
  • this will be broken after a backup/restore cycle
  • this value is stored in the "link" xattr of all files in the ROOT/ directory

$ lfs path2fid /mnt/lustre
[0x61ab:0x6cad245e:0x0]

$ getfattr -d -m trusted.link -e hex /mnt/lustre/etc
getfattr: Removing leading '/' from absolute path names

  1. file: mnt/lustre/etc
    trusted.link=0xdff1ea11010000002d000000000000000000000000000000001500000000000061ab6cad245e00000000657463

This should probably be fixed by moving MDD_ROOT_INDEX_OID to be 1UL or 2UL, and then returning FID_SEQ_START:MDD_ROOT_INDEX_OID to the client, like:

$ lfs path2fid /mnt/lustre
[0x200000000:0x2:0x0]

All that is needed is to ensure that the ROOT/ inode stores this in the LMA. This has the drawback that FID_SEQ_START is not currently exposed to clients and it fixes MDD_ROOT_INDEX_OID to a specific value (since it will be stored in the "link" xattr).

The alternative is to expose some other SEQ:OID for the root FID.



 Comments   
Comment by Andreas Dilger [ 12/Nov/11 ]

Vitaly, what does your upgrade tool do for ROOT/? Does it allocate a specific FID for ROOT/, or just pick an arbitrary one?

Comment by Vitaly Fertman [ 14/Nov/11 ]

MDD_ROOT ("ROOT") gets a new arbitrary fid (see mdd_root_rebuild()).
OSD_ROOT ("/") is not changed.

Comment by Andreas Dilger [ 10/Dec/11 ]

Alex, do you have any thoughts on this? It appears there are already some changes in this area for the orion branch, and http://review.whamcloud.com/#change,1822 is also messing with the local objects a bit. How does the orion branch access the OSD "/" FID, and the MDT /ROOT FID in a common way between osd-ldiskfs and osd-zfs? Almost anything is better than exporting the IGIF FID to clients.

Comment by Mikhail Pershin [ 20/Mar/12 ]

this is not solved well in orion, ROOT has predefined OID in FID_SEQ_LOCAL_FILE and that is exposed to client which is also not good because FLD knows nothing about such sequence, moreover that is not compatible with DNE as each MDS will have the same root FID. We need to solve this in master properly. Probably we might use also special sequence with different OIDs for each MDS and insert that sequence in FLD or have it hard-coded there.

Comment by Andreas Dilger [ 21/Mar/12 ]

I'm not sure why we would want a different FID for the root on each MDT? These are separate directories, even I'd they are striped for DNE Phase II.

Having a real FID makes sense, and in Di's recent BRL patch it introduces a reserved FUD SEQ number for system FIDs, and this would be a perfect place for the Root FID I think.

Comment by Andreas Dilger [ 22/Aug/12 ]

This behaviour has changed in orion, where it returns FID_SEQ_LOCAL_FILE:OSD_FS_ROOT_OID. This is not necessarily better, since FID_SEQ_LOCAL_FILE objects along with many others should be blocked for anything above the MDD (see LU-1518 for a host of problems this opens up). Also, changing the FID for the root inode can cause other problems, because the clients will cache the old root IGIF FID and this could result in two different FIDs in the MDT and DLM.

There was discussion about using FID_SEQ_SPECIAL:2 for the root object, and allowing clients to access this FID_SEQ. The only other FID in this range is the FID_OID_SPECIAL_BFL which is used to lock the filesystem for renames. Alternately, it could just be an arbitrary FID in the FID_SEQ_NORMAL range for each filesystem, and the initial lookup of "/ROOT" is done by name, and the FID is stored in the OSD/MDD for later reference. Note that the Xyratex filesystem upgrade tool uses an arbitrary FID already.

Making this a blocker for 2.4 so that it does not get forgotten before that release.

Comment by Andreas Dilger [ 25/Oct/12 ]

Mike, was this bug fixed now to set the root FID to FID_SEQ_SPECIAL? If yes, please include the bug number here and close.

Comment by Mikhail Pershin [ 12/Nov/12 ]

Andreas, it is not landed into master yet. I'll take care about.

Comment by Mikhail Pershin [ 28/Nov/12 ]

http://review.whamcloud.com/4682

Patch switches mdd local file creation to the local_storate library instead of using md_local_file + dt_store_open, like that was done in Orion. That helps us to unify the way we are creating local files and avoid problems of old way, e.g. layering violation, fixed fids and creation in system root only.

Meanwhile the ROOT is created in FID_SEQ_SPECIAL sequence now and clients get its real FID but not IGIF, so inode/generation is hidden. Its FID is stored in OI like for other files and patch contains generic code to check FID in OI equal to the one in EA/dirent and fix OI if needed. See ORI-756 for details.

Comment by Mikhail Pershin [ 20/Dec/12 ]

LU-2462 was found by test 38 conf-sanity failure.

Comment by Andreas Dilger [ 31/Dec/12 ]

Mike,
will the patch in http://review.whamcloud.com/4682 address the following check:

#if LUSTRE_VERSION_CODE >= OBD_OCD_VERSION(2, 3, 90, 0)
#error "fix this before release"
#endif
                /*
                 * there is one technical debt left in Orion:
                 * proper hanlding of named vs no-name objects.
                 * Llog objects have name always as they are placed in O/d/...
                 */
                if (fid_seq(lu_object_fid(&o->do_lu)) != FID_SEQ_LLOG) {
                        rc = dt_insert(env, root,
                                       (const struct dt_rec *)first_fid,
                                       (const struct dt_key *)dti->dti_buf,
                                       th, BYPASS_CAPA, 1);
                        if (rc)
                                GOTO(out_trans, rc);
                }
Comment by nasf (Inactive) [ 31/Dec/12 ]

In LFSCK 1.5, all FIDs except for IDIF and some special ones which are marked as "I_LUSTRE_NOSCRUB" (for local root, for OI files themselves, and etc), they have FID <=> ino/gen mapping in the OI files. There are two cases for the "ROOT" FIDs (the "/ROOT/.lustre" is similar) exported to client:

1) Upgrading cases.
For the existing old "ROOT" which was created before Lustre-2.4, the IGIF FID will be fixed to the "ROOT" object, and insert the IGIF FID to the "ROOT" LMA, and add "IGIF <=> ion/gen" mapping to the OI file. From now on, this fixed IGIF FID will be reserved for the "ROOT", the client can still use the fixed IGIF FID to access the "ROOT" object after MDT file-level backup/restore.

2) New formatted cases.
For the new "ROOT" which is created since Lustre-2.4, the "

{ FID_SEQ_LOCAL_FILE, MDD_ROOT_INDEX_OID, 0 }

" FID will be fixed to the "ROOT" object, and related FID-in-LMA and FID mapping in OI file are similar as other normal FIDs.

The initial OI scrub can guarantee that the "ROOT" FID to local ino/gen are valid, in spite of whether there will be MDT file-level backup/restore. So the original three issues described in this ticket description are all resolved in LFSCK 1.5.

Tappro, I prefer to use LFSCK 1.5 to resolve the initial three issues you mentioned. As for other improvements in above discussion, please consider rebase your left patch(es) against LFSCK 1.5 patch. How do you think? Thanks!

Comment by Di Wang [ 05/Jan/13 ]

Here is a new patch about root fid. http://review.whamcloud.com/#change,4342

Basically, it reserve 0xffffffffffff0000-0xfffffffffffffffe for root fids, the root fid of each MDT will be [0xffffffffffff0000 + mdt_index, 0, 0], please have a look to avoid duplicate work here.

Comment by Andreas Dilger [ 05/Jan/13 ]

Di, I'm not sure why you made a new patch, and reserved new ROOT FIDs? AFAIK There is only a singe root FID needed, and there is already MDD_ROOT_INDEX_OID reserved. Coils you please explain?

Comment by Di Wang [ 05/Jan/13 ]

Because each MDT should a ROOT dir, though only ROOT on MDT0 will be visible on client side, but I thought in the future, some other tools might need access these ROOT objects, and we might need FIDs for these ROOT dirs? Hmm I might be wrong here, probably these roots might always be special objects, and invisible from outside. I will revert these changes.

Comment by Mikhail Pershin [ 06/Jan/13 ]

I have no objections about lfsck to fix this if it does that. Probably this ticket shouldn't be blocker then.

Comment by Di Wang [ 06/Jan/13 ]

I updated the patch http://review.whamcloud.com/#change,4342 Only 1 root fid, please have a look. Thanks

Comment by Di Wang [ 08/Jan/13 ]

Mike: Do you mind to land this patch after DNE? or you can just assign this ticket to me? I will land this after DNE is landed, otherwise I need rebase my patch again. Thanks!

Comment by Mikhail Pershin [ 08/Jan/13 ]

I will need to update it after lfsck 1.5 in any case, I am fine to rework and land it after DNE too, not big problem.

Comment by Andreas Dilger [ 07/Feb/13 ]

Updated patch from DNE series is http://review.whamcloud.com/5257

Comment by Andreas Dilger [ 08/Feb/13 ]

Mike, after the landing of LU-2240, what is left to be done in this bug?

Comment by Mikhail Pershin [ 08/Feb/13 ]

Andreas, it becomes just "use local_storage library to create local file" bug. I've updated patch already and testing it now.

Comment by Mikhail Pershin [ 18/Feb/13 ]

I think this bug is fixed by Di patches, Di, is that right? However my patch contains another good fix - create all local files in uniform way using local_storage library and remove old md_local_file lib as well. We can close this bug and open new one, not blocker but major. Or we can edit this one

Comment by Di Wang [ 21/Feb/13 ]

Yes, the rootfid patch should fix this problem. But lfs path2fid /mnt/lustre still return IGIF for upgraded FS, and it only return the new seq root FID for new formatted FS.

I also think we need add some fid2path tests in those upgrade test (conf-sanity 32a/b/c) to verify this in the upgraded FS, and I can add it in another patch(fid2path fixes for DNE). Hmm, we probably should even run the whole sanity/conf-sanity tests on upgraded FS, but this is totally unrelated with this ticket.

Comment by Jodi Levi (Inactive) [ 21/Feb/13 ]

Closing this ticket per Di's comments.

Generated at Sat Feb 10 01:10:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.