[LU-2449] ZFS osd to fix . and .. handling Created: 08/Dec/12 Updated: 26/Jan/22 Resolved: 14/Mar/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Oleg Drokin | Assignee: | Alex Zhuravlev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | zfs | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 5787 | ||||||||||||||||
| Description |
|
tags 2.3.56 and 2.3.57 builds are broken with: This must be addressed before the next tag |
| Comments |
| Comment by Peter Jones [ 10/Dec/12 ] |
|
Bobijam Could you please look into this one? Thanks Peter |
| Comment by Li Wei (Inactive) [ 12/Dec/12 ] |
|
Added Alex and Andreas to the watcher list. |
| Comment by Alex Zhuravlev [ 17/Dec/12 ] |
|
here is a short description of the problem: ZFS does not store "." and ".." on a disk, instead it stores parent dnode in znode and generate dots in zfs_readdir(): while (!done) {
to comply with ZFS on-disk format we should: |
| Comment by Alex Zhuravlev [ 17/Dec/12 ] |
|
one issue is that parent SA can store upto 8 bytes: SA_ADD_BULK_ATTR(bulk, cnt, SA_ZPL_PARENT(uos), NULL, &parent, 8); making it impossible to put dnode and fid in. there are two options here: also, remember insert("..") can be called few times (in different transactions) as a result of rename. |
| Comment by Andreas Dilger [ 17/Dec/12 ] |
|
We already store the parent FID in the linkEA on the object. That allows finding the FID of ".." without searching. FID of "." can be generated from LMA. |
| Comment by Alex Zhuravlev [ 17/Dec/12 ] |
|
LinkEA is being manipulated by MDD and it's not considered as mandatory, AFAIU. for example, we ignore errors related to LinkEA. |
| Comment by Andreas Dilger [ 17/Dec/12 ] |
|
We only ignore such errors when there are many links on a regular file. There should always be enough space for a single linkEA on a directory to point to the parent. All ZFS filesystems are 2.x, so they should also have linkEA entries already, unlike upgraded 1.8 filesystems. I agree MDD creating linkEA might be a bit bothersome, but it is only being read by the OSD code, never modified. If it isn't available, it is always possible to do OI lookup for parent dnode number to find FID? |
| Comment by Alex Zhuravlev [ 17/Dec/12 ] |
|
yes, of course. a minor correction: should be possible to load LMA of specific dnode to find its FID. |
| Comment by Alex Zhuravlev [ 10/Jan/13 ] |
|
http://review.whamcloud.com/4990 not the final version though.. some compatibility with the existing setup is required. |
| Comment by Alex Zhuravlev [ 11/Jan/13 ] |
|
the patch has been updated with the compatibility and additional infrastructure to simulate old setup (where ./.. are stored on a disk). |
| Comment by Alex Zhuravlev [ 12/Jan/13 ] |
|
hmm, this patch provides forward compatibility, but not backward: it does remove ./.. if they are stored on a disk, Christopher, what would be your opinion on this? would you be OK to switch to a new tag and do not go back? |
| Comment by Andreas Dilger [ 12/Jan/13 ] |
|
Alex, I think this is the reasonable path forward. The only other option is to start filling in and using the "zp_parent" field, but not delete the "." and ".." entries yet, and that can be done in a separate patch. That would allow migrating to the new code, and only disabling storing of "." and ".." once the first patch is known to be working. |
| Comment by Andreas Dilger [ 14/Jan/13 ] |
|
To comment further - I don't think it is necessary to make the extra effort to remove the "." and ".." entries from the on-disk ZAPs, so long as they are filtered out at the readdir level. The legacy filesystems/directories holding "." and ".." will disappear eventually. |
| Comment by Alex Zhuravlev [ 14/Jan/13 ] |
|
not that I'm against this |
| Comment by Christopher Morrone [ 15/Jan/13 ] |
|
If thats the worst that we'll see (two sets of . or ..), then I think that is reasonable, and we don't need to jump through hoops to eliminate it. A tool external to lustre to clean it up might be nice. The Sequoia filesystem could live for another 5 years. |
| Comment by Andreas Dilger [ 28/Jan/13 ] |
|
Brian, any chance you will have time to inspect the patch in http://review.whamcloud.com/4990? |
| Comment by Jodi Levi (Inactive) [ 06/Feb/13 ] |
|
Brian, would you have time to inspect this patch this week? Thank you! |
| Comment by Peter Jones [ 08/Feb/13 ] |
|
Landed for 2.4 |
| Comment by Andreas Dilger [ 08/Feb/13 ] |
|
Still a few fixes for this. |
| Comment by Andreas Dilger [ 19/Feb/13 ] |
|
Still http://review.whamcloud.com/5413 to land. |
| Comment by Peter Jones [ 27/Feb/13 ] |
|
Landed for 2.4 |
| Comment by Alex Zhuravlev [ 07/Mar/13 ] |
|
http://review.whamcloud.com/5629 – this patch implements fallback path for lookup(..) when LinkEA is not available. the only remaining bit to close the ticket is to update parent dnode for regular files on object creation. |
| Comment by Alex Zhuravlev [ 07/Mar/13 ] |
|
http://review.whamcloud.com/#change,5642 – with this patch all the objects should be getting parent dnode attribute initialized properly. |
| Comment by Peter Jones [ 14/Mar/13 ] |
|
Landed for 2.4 |
| Comment by Jodi Levi (Inactive) [ 15/Mar/13 ] |
|
Lowering priority now that first patch has landed. |