[LU-10283] changelog entries for creates in striped directories use stripe FID as pfid Created: 27/Nov/17 Updated: 13/Jan/24 Resolved: 13/Dec/23 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.16.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | Nikitas Angelinas |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Severity: | 3 | ||||||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||||||
| Description |
|
When we create files in striped directories the changelog entries emitted use the parent stripe FID (instead of the parent dir FID) as the pfid for the create: m:lustre# lfs mkdir -c2 d0
m:lustre# lfs path2fid d0
[0x200000402:0xf9f:0x0]
m:lustre# lfs getdirstripe d0
lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
mdtidx FID[seq:oid:ver]
0 [0x200000400:0x3d3:0x0]
1 [0x240000401:0x3d3:0x0]
m:lustre# touch d0/f{0,1}
m:lustre# lfs changelog lustre-MDT0000
10753 02MKDIR 14:19:40.273195957 2017.11.27 0x0 t=[0x200000402:0xf9f:0x0] j=lfs.0 p=[0x200000007:0x1:0x0] d0
10754 01CREAT 14:20:08.243388795 2017.11.27 0x0 t=[0x200000402:0xfa0:0x0] j=touch.0 p=[0x200000400:0x3d3:0x0] f1
10755 11CLOSE 14:20:08.245569226 2017.11.27 0x42 t=[0x200000402:0xfa0:0x0] j=touch.0
m:lustre# lfs changelog lustre-MDT0001
11883 01CREAT 14:20:08.240982376 2017.11.27 0x0 t=[0x240000402:0x111f:0x0] j=touch.0 p=[0x240000401:0x3d3:0x0] f0
11884 11CLOSE 14:20:08.242496774 2017.11.27 0x42 t=[0x240000402:0x111f:0x0] j=touch.0
This confuses lustre_rsync. I wonder if we should fix this. |
| Comments |
| Comment by John Hammond [ 27/Nov/17 ] |
|
Thomas, Henri, Quentin, Does robinhood handle this correctly? |
| Comment by Andreas Dilger [ 28/Nov/17 ] |
|
It seems to me that returning the FID of the shard is not the best for the ChangeLog, because the details of the directory striping should not be exposed in this way. The striping of the directory may change over time, and the target directory may not have the same striping either. If “lfs fid2path [shard FID]” returns the same parent path for all of the shards, then this detail should not be totally evident to lustre_rsync, but any tools that are comparing the parent directories by FID may think that these two files were created in different directories. Thoughts on how to fix this? Since each shard stores the LMV EA with the parent FID. it should be possible to log the proper parent FID into the ChangeLog, but I’m wondering if we might lose something else if we do that? |
| Comment by John Hammond [ 28/Nov/17 ] |
|
> If “lfs fid2path [shard FID]” returns the same parent path for all of the shards, then this detail should not be totally evident to lustre_rsync, but any tools that are comparing the parent directories by FID may think that these two files were created in different directories. Yes, "lfs fid2path [shard FID]" does return the parent path. However there are some cases in lustre_rsync where the parent path does not exist in the archive, so we create the file in .lustrerepl and store the tfid, pfid, and name in the status log. Then if lustre_rsync later sees a rename on the pfid then it moves all saved files with matching pfid from the .lustrerepl directory to the rename destination in the target archive. |
| Comment by John Hammond [ 30/Nov/17 ] |
|
Allô? Any comment from the RBH developers? |
| Comment by Thomas Leibovici [ 01/Dec/17 ] |
|
Current rbh implementation would expect the directory fid, not the shard fid which is somehow a lustre internal. One could say that the shard fid could indicate the MDT where entries are located, but this information is already given by the MDT stream that has the log record. |
| Comment by Olaf Weber [ 16/Aug/18 ] |
|
From a DMF perspective we'd also expect the directory fid to be reported. |
| Comment by Andreas Dilger [ 25/Sep/19 ] |
|
Discussed at LAD'19 is that the ChangeLog could store the actual directory FID rather than the shard FID. In general, the shard FID is not very useful to userspace, since the directory striping should be transparent to users, and if the directory is restriped the shards could change anyway. On the MDTs where the operation is being done, it should be possible to know that the operation is done in a striped directory and what the actual directory FID is, so this should be possible to implement. It shouldn't cause problems for existing Changelog consumers, since it wouldn't be different than operations within a local directory. |
| Comment by Olaf Weber [ 02/Feb/22 ] |
|
We are now encountering this in the field. Are there any plans to address this? |
| Comment by Gerrit Updater [ 14/Jun/23 ] |
|
"Nikitas Angelinas <nikitas.angelinas@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51322 |
| Comment by Nikitas Angelinas [ 14/Jun/23 ] |
|
I have submitted a patch from Dmitry Ivanov that seems to address this issue, by detecting whether a directory is striped using XATTR_NAME_LMV and if so, using mdd_parent_fid() to obtain the real parent FID for use in the generated changelog record:
Sergey Cheremencev had shown that this patch can result in an increased number of cross-MDT RPCs, so the added functionality needs to be explicitly enabled by setting the changelog_striped_dir_real_pfid tunable and is disabled by default. There have been some discussions re the possibility of avoiding the extra cross-MDT RPCs by obtaining the real parent fid from the parent's REMOTE_PARENT_DIR entry's linkEA, but Vitaly reckoned this would still require some RPCs in cases where the parent's fid is in a different MDT. Unfortunately, I am not sure if this is accurate and/or if we could add any additional information to the REMOTE_PARENT_DIR entries to use them for avoiding the extra RPCs in this case? |
| Comment by Olaf Weber [ 11/Jul/23 ] |
|
In his review comments Andreas worries about compatibility with tools that rely on the stripe FID being returned in the changelog records. Does anyone know whether such tools actually exist? |
| Comment by Guillaume Courrier [ 11/Jul/23 ] |
|
As far as robinhood is concerned, it assumes that the pfid in the changelog record is the FID of the parent directory. We didn't catch this issue in the first implementation of the new changelog reader of Robinhood 4. Robinhood doesn't manipulate shard FIDs. So from its perspective, this would result in a bug. The fix in patch 51322 would work for us. A tunable might be useful to be able to at least know which version of the changelog we are reading (to know whether the pfid is the actual FID of the directory or not). A new record in the changelog would be fine as well. |
| Comment by Andreas Dilger [ 31/Jul/23 ] |
|
If everyone considers this a bug, I'd be fine to fix the bug by default, and just have a tunable to revert to the previous behavior in the field if some customer specifically needs it. I suspect there will be few users for this, and the tunable can be marked for removal in some future release. |
| Comment by Olaf Weber [ 01/Aug/23 ] |
|
My vote is "bug to be fixed by default". |
| Comment by Gerrit Updater [ 13/Dec/23 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51322/ |
| Comment by Peter Jones [ 13/Dec/23 ] |
|
Landed for 2.16 |