Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-10283

changelog entries for creates in striped directories use stripe FID as pfid

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.16.0
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      When we create files in striped directories the changelog entries emitted use the parent stripe FID (instead of the parent dir FID) as the pfid for the create:

      m:lustre# lfs mkdir -c2 d0
      m:lustre# lfs path2fid d0
      [0x200000402:0xf9f:0x0]
      m:lustre# lfs getdirstripe d0
      lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: fnv_1a_64
      mdtidx         FID[seq:oid:ver]
           0         [0x200000400:0x3d3:0x0]        
           1         [0x240000401:0x3d3:0x0]        
      m:lustre# touch d0/f{0,1}
      m:lustre# lfs changelog lustre-MDT0000
      10753 02MKDIR 14:19:40.273195957 2017.11.27 0x0 t=[0x200000402:0xf9f:0x0] j=lfs.0 p=[0x200000007:0x1:0x0] d0
      10754 01CREAT 14:20:08.243388795 2017.11.27 0x0 t=[0x200000402:0xfa0:0x0] j=touch.0 p=[0x200000400:0x3d3:0x0] f1
      10755 11CLOSE 14:20:08.245569226 2017.11.27 0x42 t=[0x200000402:0xfa0:0x0] j=touch.0
      m:lustre# lfs changelog lustre-MDT0001
      11883 01CREAT 14:20:08.240982376 2017.11.27 0x0 t=[0x240000402:0x111f:0x0] j=touch.0 p=[0x240000401:0x3d3:0x0] f0
      11884 11CLOSE 14:20:08.242496774 2017.11.27 0x42 t=[0x240000402:0x111f:0x0] j=touch.0
      

      This confuses lustre_rsync. I wonder if we should fix this.

      Attachments

        Issue Links

          Activity

            [LU-10283] changelog entries for creates in striped directories use stripe FID as pfid
            pjones Peter Jones added a comment -

            Landed for 2.16

            pjones Peter Jones added a comment - Landed for 2.16

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51322/
            Subject: LU-10283 mdd: fix parent FID in changelog of striped directory
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 3554923af9e3260235865d90949ecd2924bbbc0e

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/51322/ Subject: LU-10283 mdd: fix parent FID in changelog of striped directory Project: fs/lustre-release Branch: master Current Patch Set: Commit: 3554923af9e3260235865d90949ecd2924bbbc0e

            My vote is "bug to be fixed by default".

            olaf Olaf Weber (Inactive) added a comment - My vote is "bug to be fixed by default".

            If everyone considers this a bug, I'd be fine to fix the bug by default, and just have a tunable to revert to the previous behavior in the field if some customer specifically needs it. I suspect there will be few users for this, and the tunable can be marked for removal in some future release.

            adilger Andreas Dilger added a comment - If everyone considers this a bug, I'd be fine to fix the bug by default, and just have a tunable to revert to the previous behavior in the field if some customer specifically needs it. I suspect there will be few users for this, and the tunable can be marked for removal in some future release.

            As far as robinhood is concerned, it assumes that the pfid in the changelog record is the FID of the parent directory. We didn't catch this issue in the first implementation of the new changelog reader of Robinhood 4. Robinhood doesn't manipulate shard FIDs. So from its perspective, this would result in a bug. The fix in patch 51322 would work for us. A tunable might be useful to be able to at least know which version of the changelog we are reading (to know whether the pfid is the actual FID of the directory or not). A new record in the changelog would be fine as well.

            courrier Guillaume Courrier added a comment - As far as robinhood is concerned, it assumes that the pfid in the changelog record is the FID of the parent directory. We didn't catch this issue in the first implementation of the new changelog reader of Robinhood 4. Robinhood doesn't manipulate shard FIDs. So from its perspective, this would result in a bug. The fix in patch 51322 would work for us. A tunable might be useful to be able to at least know which version of the changelog we are reading (to know whether the pfid is the actual FID of the directory or not). A new record in the changelog would be fine as well.

            In his review comments Andreas worries about compatibility with tools that rely on the stripe FID being returned in the changelog records. Does anyone know whether such tools actually exist?

            olaf Olaf Weber (Inactive) added a comment - In his review comments Andreas worries about compatibility with tools that rely on the stripe FID being returned in the changelog records. Does anyone know whether such tools actually exist?
            nangelinas Nikitas Angelinas added a comment - - edited

            I have submitted a patch from Dmitry Ivanov that seems to address this issue, by detecting whether a directory is striped using XATTR_NAME_LMV and if so, using mdd_parent_fid() to obtain the real parent FID for use in the generated changelog record:

            # git describe
            v2_15_56-1-g80258995e4
            # lfs mkdir -i -1 -c 2 /mnt/lustre/testdir0
            # lctl get_param mdd.*.changelog_striped_dir_real_pfid
            mdd.lustre-MDT0000.changelog_striped_dir_real_pfid=0
            mdd.lustre-MDT0001.changelog_striped_dir_real_pfid=0
            # lfs getdirstripe /mnt/lustre/testdir0
            lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: crush
            mdtidx FID[seq:oid:ver]
            0 [0x200000400:0x2:0x0]
            1 [0x240000401:0x2:0x0]
            # lfs path2fid /mnt/lustre/testdir0
            [0x200000402:0x1:0x0]
            # touch /mnt/lustre/testdir0/testfile0
            # lfs changelog lustre-MDT0000; lfs changelog lustre-MDT0001
            ...
            2 01CREAT 21:46:35.984819711 2023.06.14 0x0 t=[0x200000402:0x2:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000400:0x2:0x0] testfile0
            3 11CLOSE 21:46:36.028827790 2023.06.14 0x42 t=[0x200000402:0x2:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo
            # lctl set_param mdd.*.changelog_striped_dir_real_pfid=1
            mdd.lustre-MDT0000.changelog_striped_dir_real_pfid=1
            mdd.lustre-MDT0001.changelog_striped_dir_real_pfid=1
            # lctl get_param mdd.*.changelog_striped_dir_real_pfid
            mdd.lustre-MDT0000.changelog_striped_dir_real_pfid=1
            mdd.lustre-MDT0001.changelog_striped_dir_real_pfid=1
            # touch /mnt/lustre/testdir0/testfile1
            # lfs changelog lustre-MDT0000; lfs changelog lustre-MDT0001
            ...
            2 01CREAT 21:46:35.984819711 2023.06.14 0x0 t=[0x200000402:0x2:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000400:0x2:0x0] testfile0
            3 11CLOSE 21:46:36.028827790 2023.06.14 0x42 t=[0x200000402:0x2:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo
            4 01CREAT 21:47:08.772277807 2023.06.14 0x0 t=[0x200000402:0x3:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p=[0x200000402:0x1:0x0] testfile1
            5 11CLOSE 21:47:08.831376478 2023.06.14 0x42 t=[0x200000402:0x3:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo

            Sergey Cheremencev had shown that this patch can result in an increased number of cross-MDT RPCs, so the added functionality needs to be explicitly enabled by setting the changelog_striped_dir_real_pfid tunable and is disabled by default. There have been some discussions re the possibility of avoiding the extra cross-MDT RPCs by obtaining the real parent fid from the parent's REMOTE_PARENT_DIR entry's linkEA, but Vitaly reckoned this would still require some RPCs in cases where the parent's fid is in a different MDT. Unfortunately, I am not sure if this is accurate and/or if we could add any additional information to the REMOTE_PARENT_DIR entries to use them for avoiding the extra RPCs in this case?

            nangelinas Nikitas Angelinas added a comment - - edited I have submitted a patch from Dmitry Ivanov that seems to address this issue, by detecting whether a directory is striped using XATTR_NAME_LMV and if so, using mdd_parent_fid() to obtain the real parent FID for use in the generated changelog record: # git describe v2_15_56-1-g80258995e4 # lfs mkdir -i -1 -c 2 /mnt/lustre/testdir0 # lctl get_param mdd.*.changelog_striped_dir_real_pfid mdd.lustre-MDT0000.changelog_striped_dir_real_pfid=0 mdd.lustre-MDT0001.changelog_striped_dir_real_pfid=0 # lfs getdirstripe /mnt/lustre/testdir0 lmv_stripe_count: 2 lmv_stripe_offset: 0 lmv_hash_type: crush mdtidx FID [seq:oid:ver] 0 [0x200000400:0x2:0x0] 1 [0x240000401:0x2:0x0] # lfs path2fid /mnt/lustre/testdir0 [0x200000402:0x1:0x0] # touch /mnt/lustre/testdir0/testfile0 # lfs changelog lustre-MDT0000; lfs changelog lustre-MDT0001 ... 2 01CREAT 21:46:35.984819711 2023.06.14 0x0 t= [0x200000402:0x2:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p= [0x200000400:0x2:0x0] testfile0 3 11CLOSE 21:46:36.028827790 2023.06.14 0x42 t= [0x200000402:0x2:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo # lctl set_param mdd.*.changelog_striped_dir_real_pfid=1 mdd.lustre-MDT0000.changelog_striped_dir_real_pfid=1 mdd.lustre-MDT0001.changelog_striped_dir_real_pfid=1 # lctl get_param mdd.*.changelog_striped_dir_real_pfid mdd.lustre-MDT0000.changelog_striped_dir_real_pfid=1 mdd.lustre-MDT0001.changelog_striped_dir_real_pfid=1 # touch /mnt/lustre/testdir0/testfile1 # lfs changelog lustre-MDT0000; lfs changelog lustre-MDT0001 ... 2 01CREAT 21:46:35.984819711 2023.06.14 0x0 t= [0x200000402:0x2:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p= [0x200000400:0x2:0x0] testfile0 3 11CLOSE 21:46:36.028827790 2023.06.14 0x42 t= [0x200000402:0x2:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo 4 01CREAT 21:47:08.772277807 2023.06.14 0x0 t= [0x200000402:0x3:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo p= [0x200000402:0x1:0x0] testfile1 5 11CLOSE 21:47:08.831376478 2023.06.14 0x42 t= [0x200000402:0x3:0x0] j=touch.0 ef=0xf u=0:0 nid=0@lo Sergey Cheremencev had shown that this patch can result in an increased number of cross-MDT RPCs, so the added functionality needs to be explicitly enabled by setting the changelog_striped_dir_real_pfid tunable and is disabled by default. There have been some discussions re the possibility of avoiding the extra cross-MDT RPCs by obtaining the real parent fid from the parent's REMOTE_PARENT_DIR entry's linkEA, but Vitaly reckoned this would still require some RPCs in cases where the parent's fid is in a different MDT. Unfortunately, I am not sure if this is accurate and/or if we could add any additional information to the REMOTE_PARENT_DIR entries to use them for avoiding the extra RPCs in this case?

            "Nikitas Angelinas <nikitas.angelinas@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51322
            Subject: LU-10283 mdd: fix parent FID in changelog of striped directory
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 80258995e44b9911deb54e9d914443a98a680020

            gerrit Gerrit Updater added a comment - "Nikitas Angelinas <nikitas.angelinas@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51322 Subject: LU-10283 mdd: fix parent FID in changelog of striped directory Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 80258995e44b9911deb54e9d914443a98a680020

            We are now encountering this in the field. Are there any plans to address this?

            olaf Olaf Weber (Inactive) added a comment - We are now encountering this in the field. Are there any plans to address this?

            Discussed at LAD'19 is that the ChangeLog could store the actual directory FID rather than the shard FID. In general, the shard FID is not very useful to userspace, since the directory striping should be transparent to users, and if the directory is restriped the shards could change anyway. On the MDTs where the operation is being done, it should be possible to know that the operation is done in a striped directory and what the actual directory FID is, so this should be possible to implement. It shouldn't cause problems for existing Changelog consumers, since it wouldn't be different than operations within a local directory.

            adilger Andreas Dilger added a comment - Discussed at LAD'19 is that the ChangeLog could store the actual directory FID rather than the shard FID. In general, the shard FID is not very useful to userspace, since the directory striping should be transparent to users, and if the directory is restriped the shards could change anyway. On the MDTs where the operation is being done, it should be possible to know that the operation is done in a striped directory and what the actual directory FID is, so this should be possible to implement. It shouldn't cause problems for existing Changelog consumers, since it wouldn't be different than operations within a local directory.

            People

              nangelinas Nikitas Angelinas
              jhammond John Hammond
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: