[LU-2449] ZFS osd to fix . and .. handling Created: 08/Dec/12  Updated: 26/Jan/22  Resolved: 14/Mar/13

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.4.0
Fix Version/s: Lustre 2.4.0

Type: Bug Priority: Minor
Reporter: Oleg Drokin Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: zfs

Issue Links:
Duplicate
is duplicated by LU-2231 ZFS OSD not setting dnode parent field Resolved
Related
is related to LU-2890 mdd_dir_page_build() build page faile... Resolved
Severity: 3
Rank (Obsolete): 5787

 Description   

tags 2.3.56 and 2.3.57 builds are broken with:
/home/green/git/lustre-release/lustre/osd-zfs/osd_index.c:540:2: error: #warning "fix '.' and '..' handling"

This must be addressed before the next tag



 Comments   
Comment by Peter Jones [ 10/Dec/12 ]

Bobijam

Could you please look into this one?

Thanks

Peter

Comment by Li Wei (Inactive) [ 12/Dec/12 ]

Added Alex and Andreas to the watcher list.

Comment by Alex Zhuravlev [ 17/Dec/12 ]

here is a short description of the problem: ZFS does not store "." and ".." on a disk, instead it stores parent dnode in znode and generate dots in zfs_readdir():

while (!done) {
uint64_t objnum;
/*

  • Special case `.', `..', and `.zfs'.
    */
    if (*pos == 0) { (void) strcpy(zap.za_name, "."); zap.za_normalization_conflict = 0; objnum = zp->z_id; }

    else if (*pos == 1)

    { (void) strcpy(zap.za_name, ".."); zap.za_normalization_conflict = 0; objnum = parent; }

    else if (*pos == 2 && zfs_show_ctldir(zp)) {
    (void) strcpy(zap.za_name, ZFS_CTLDIR_NAME);
    zap.za_normalization_conflict = 0;
    objnum = ZFSCTL_INO_ROOT;

to comply with ZFS on-disk format we should:
1) initialize parent dnode attribute upon insert("..")
2) do nothing upon insert(".")
3) implement logic similar to zfs_readdir() in iterator methods: osd_zap_it_get(), osd_zap_it_key(), etc

Comment by Alex Zhuravlev [ 17/Dec/12 ]

one issue is that parent SA can store upto 8 bytes:

SA_ADD_BULK_ATTR(bulk, cnt, SA_ZPL_PARENT(uos), NULL, &parent, 8);

making it impossible to put dnode and fid in. there are two options here:
1) fetch parent dnode, then fetch LMA from that parent dnode – might be expensive, but usually should be cached
2) store parent fid as an extra EA

also, remember insert("..") can be called few times (in different transactions) as a result of rename.

Comment by Andreas Dilger [ 17/Dec/12 ]

We already store the parent FID in the linkEA on the object. That allows finding the FID of ".." without searching. FID of "." can be generated from LMA.

Comment by Alex Zhuravlev [ 17/Dec/12 ]

LinkEA is being manipulated by MDD and it's not considered as mandatory, AFAIU. for example, we ignore errors related to LinkEA.

Comment by Andreas Dilger [ 17/Dec/12 ]

We only ignore such errors when there are many links on a regular file. There should always be enough space for a single linkEA on a directory to point to the parent. All ZFS filesystems are 2.x, so they should also have linkEA entries already, unlike upgraded 1.8 filesystems.

I agree MDD creating linkEA might be a bit bothersome, but it is only being read by the OSD code, never modified. If it isn't available, it is always possible to do OI lookup for parent dnode number to find FID?

Comment by Alex Zhuravlev [ 17/Dec/12 ]

yes, of course. a minor correction: should be possible to load LMA of specific dnode to find its FID.

Comment by Alex Zhuravlev [ 10/Jan/13 ]

http://review.whamcloud.com/4990

not the final version though.. some compatibility with the existing setup is required.

Comment by Alex Zhuravlev [ 11/Jan/13 ]

the patch has been updated with the compatibility and additional infrastructure to simulate old setup (where ./.. are stored on a disk).

Comment by Alex Zhuravlev [ 12/Jan/13 ]

hmm, this patch provides forward compatibility, but not backward: it does remove ./.. if they are stored on a disk,
but do not add new one (they are generated on demand like ZPL does). with the current master code that will result
in non-existing ./.. and number of consequences.

Christopher, what would be your opinion on this? would you be OK to switch to a new tag and do not go back?

Comment by Andreas Dilger [ 12/Jan/13 ]

Alex, I think this is the reasonable path forward. The only other option is to start filling in and using the "zp_parent" field, but not delete the "." and ".." entries yet, and that can be done in a separate patch. That would allow migrating to the new code, and only disabling storing of "." and ".." once the first patch is known to be working.

Comment by Andreas Dilger [ 14/Jan/13 ]

To comment further - I don't think it is necessary to make the extra effort to remove the "." and ".." entries from the on-disk ZAPs, so long as they are filtered out at the readdir level. The legacy filesystems/directories holding "." and ".." will disappear eventually.

Comment by Alex Zhuravlev [ 14/Jan/13 ]

not that I'm against this just got to think a potential ZPL user can be confused finding two ./..

Comment by Christopher Morrone [ 15/Jan/13 ]

If thats the worst that we'll see (two sets of . or ..), then I think that is reasonable, and we don't need to jump through hoops to eliminate it. A tool external to lustre to clean it up might be nice. The Sequoia filesystem could live for another 5 years.

Comment by Andreas Dilger [ 28/Jan/13 ]

Brian, any chance you will have time to inspect the patch in http://review.whamcloud.com/4990?

Comment by Jodi Levi (Inactive) [ 06/Feb/13 ]

Brian, would you have time to inspect this patch this week? Thank you!

Comment by Peter Jones [ 08/Feb/13 ]

Landed for 2.4

Comment by Andreas Dilger [ 08/Feb/13 ]

Still a few fixes for this.

Comment by Andreas Dilger [ 19/Feb/13 ]

Still http://review.whamcloud.com/5413 to land.

Comment by Peter Jones [ 27/Feb/13 ]

Landed for 2.4

Comment by Alex Zhuravlev [ 07/Mar/13 ]

http://review.whamcloud.com/5629 – this patch implements fallback path for lookup(..) when LinkEA is not available.

the only remaining bit to close the ticket is to update parent dnode for regular files on object creation.

Comment by Alex Zhuravlev [ 07/Mar/13 ]

http://review.whamcloud.com/#change,5642 – with this patch all the objects should be getting parent dnode attribute initialized properly.

Comment by Peter Jones [ 14/Mar/13 ]

Landed for 2.4

Comment by Jodi Levi (Inactive) [ 15/Mar/13 ]

Lowering priority now that first patch has landed.

Generated at Sat Feb 10 01:25:18 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.