Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14765

sanity-flr test_44c: mirror split does not reduce block#

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • Lustre 2.15.0
    • 3
    • 9223372036854775807

    Description

      This issue was created by maloo for S Buisson <sbuisson@ddn.com>

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/8d692dc1-910c-4ee7-baf0-21734795ed81

      test_44c failed with the following error:

      mirror split does not reduce block# 10485760 != 4194304
      

      sanity-flr test_44c seems buggy. It starts by writing 10MB, but it reports a file size of 4MB:

      ** before mirror ops, file blocks=4096 KiB
      

      VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
      sanity-flr test_44c - mirror split does not reduce block# 10485760 != 4194304

      Attachments

        Issue Links

          Activity

            [LU-14765] sanity-flr test_44c: mirror split does not reduce block#

            "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50310
            Subject: LU-14765 test: enable sanity-flr/44c
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: c945ba741e73eb42c0d013d2a1ab44bda416986b

            gerrit Gerrit Updater added a comment - "Zhenyu Xu <bobijam@hotmail.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/50310 Subject: LU-14765 test: enable sanity-flr/44c Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: c945ba741e73eb42c0d013d2a1ab44bda416986b

            This is causing user-visible issues in the field, since "du" with STRICT SOM reports block numbers that are too large after the mirror has been split from the file. There was some work in LU-14526 to reset the blocks count for SOM files on split, but for STRICT SOM the values should actually be correct, since they are returned to userspace as authoritative values.

            Rather than have the client do an extra "stat" of all mirrors on each close (which is expensive and mostly useless), I think the right answer is that this extra stat is only needed when doing the mirror split operation. That will ensure the right values are stored in the STRICT SOM, without adding overhead to the very common close operation.

            adilger Andreas Dilger added a comment - This is causing user-visible issues in the field, since "du" with STRICT SOM reports block numbers that are too large after the mirror has been split from the file. There was some work in LU-14526 to reset the blocks count for SOM files on split, but for STRICT SOM the values should actually be correct, since they are returned to userspace as authoritative values. Rather than have the client do an extra "stat" of all mirrors on each close (which is expensive and mostly useless), I think the right answer is that this extra stat is only needed when doing the mirror split operation. That will ensure the right values are stored in the STRICT SOM, without adding overhead to the very common close operation.

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44074/
            Subject: LU-14765 test: disable sanity-flr/44c
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: a45fe93cd8d1e941d58e0f11e21649e1956ba2c7

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/44074/ Subject: LU-14765 test: disable sanity-flr/44c Project: fs/lustre-release Branch: master Current Patch Set: Commit: a45fe93cd8d1e941d58e0f11e21649e1956ba2c7
            nangelinas Nikitas Angelinas added a comment - +1 on master: https://testing.whamcloud.com/test_sets/d5df04c7-cfe9-4a8e-99a9-e7b2beb3f55b
            emoly.liu Emoly Liu added a comment - +1 on master: https://testing.whamcloud.com/test_sets/3d97f947-b99d-4316-9b42-6f2502a96613
            bobijam Zhenyu Xu added a comment -

            Bobijam, is there something unusual with how the SOM blocks count is updated? I'd think that it is just using the sum of the blocks counts for all mirror objects at the time the file is closed. If some of the mirrors are stale, then blocks count may not be totally accurate, but after "lfs mirror resync" the client doing the resync should be totally uptodate and can send this to the MDS.

            The initial issues raised is that 'stat' does not show file blocks number increase after it has extra mirror merged.

            If one mirror is added, supposed that SOM is strict and it reflects all blocks of the file, then we can account the blocks number accurately after the merge; if the SOM is not strict, then after the merge the SOM is still not accurate, then 'stat' need to rely on glimpse size to collect the file's blocks number, but glimpse would only choose one mirror to calculate blocks number for one mirror, and the result blocks number would not be bigger than that of the pre-mirror-extended file.

            The similar scenarios also happens on mirror split, if SOM is not accurate on the file's blocks number before the mirror operation, the blocks number would not be showed to be decreased after the mirror split as well.

            bobijam Zhenyu Xu added a comment - Bobijam, is there something unusual with how the SOM blocks count is updated? I'd think that it is just using the sum of the blocks counts for all mirror objects at the time the file is closed. If some of the mirrors are stale, then blocks count may not be totally accurate, but after "lfs mirror resync" the client doing the resync should be totally uptodate and can send this to the MDS. The initial issues raised is that 'stat' does not show file blocks number increase after it has extra mirror merged. If one mirror is added, supposed that SOM is strict and it reflects all blocks of the file, then we can account the blocks number accurately after the merge; if the SOM is not strict, then after the merge the SOM is still not accurate, then 'stat' need to rely on glimpse size to collect the file's blocks number, but glimpse would only choose one mirror to calculate blocks number for one mirror, and the result blocks number would not be bigger than that of the pre-mirror-extended file. The similar scenarios also happens on mirror split, if SOM is not accurate on the file's blocks number before the mirror operation, the blocks number would not be showed to be decreased after the mirror split as well.
            artem_blagodarenko Artem Blagodarenko (Inactive) added a comment - +1 on master https://testing.whamcloud.com/test_sets/950a96fc-10c7-41a8-b7d2-a8b14834b02b

            Bobijam, is there something unusual with how the SOM blocks count is updated? I'd think that it is just using the sum of the blocks counts for all mirror objects at the time the file is closed. If some of the mirrors are stale, then blocks count may not be totally accurate, but after "lfs mirror resync" the client doing the resync should be totally uptodate and can send this to the MDS.

            adilger Andreas Dilger added a comment - Bobijam, is there something unusual with how the SOM blocks count is updated? I'd think that it is just using the sum of the blocks counts for all mirror objects at the time the file is closed. If some of the mirrors are stale, then blocks count may not be totally accurate, but after " lfs mirror resync " the client doing the resync should be totally uptodate and can send this to the MDS.
            bobijam Zhenyu Xu added a comment -

            yes, write do account correct block number changes, but a file with multiple mirrors will mess the block number base (so I'm thinking even mirror extend need to invalidate the block number in SOM). 

            And its block number cannot be correct until it is split to remain only one mirror.

            bobijam Zhenyu Xu added a comment - yes, write do account correct block number changes, but a file with multiple mirrors will mess the block number base (so I'm thinking even mirror extend need to invalidate the block number in SOM).  And its block number cannot be correct until it is split to remain only one mirror.

            i would think that when SOM sets the STRICT flag that all client writes are complete and committed, so the blocks could should be correct at this time? Alternately, should we allow the SOM blocks count to be increased from later client RPCs, like LSOM does, so that the size is correct/strict and the blocks at least get "more correct" over time?

            adilger Andreas Dilger added a comment - i would think that when SOM sets the STRICT flag that all client writes are complete and committed, so the blocks could should be correct at this time? Alternately, should we allow the SOM blocks count to be increased from later client RPCs, like LSOM does, so that the size is correct/strict and the blocks at least get "more correct" over time?

            People

              bobijam Zhenyu Xu
              maloo Maloo
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated: