Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9538

Size on MDT with guarantee of eventual consistency

Details

    • New Feature
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.0
    • None
    • 9223372036854775807

    Description

      I belive that size on MDT has been discussed for a long time, and there were even some implementations of it before. I am creating this ticket to discuss it again, because keeping file size on MDTs seems very important for the new policy engine of Lustre (LiPE) that I am currently work on.

      LiPE scans MDTs directly and extracts almost all the file attributes. Values of a series of mathematical expressions are calculated by LiPE according to these attribute values. And the expression values determine which rules the corresponding file matches with. This works perfectly for almost all metadata of files, except the file sizes, because MDT doesn't keep file sizes. That is the reason why we want to add file size on MDT.

      Given the fact that file size on MDT has been discussed for a long time, I believe a lot of problems/difficulties of implementing this feature has been recognized by people in Lustre community. And I think is obvious that implementing a strict size on MDT with strong guarantees is too hard.

      For LiPE, I think file sizes with guarantees of eventual consistency should be enough for most use cases. Because 1) smart administrators will leave enough margin of data management. I don't think smart administrator will define any dangerous rule based on the strict file size without enough margins of timestamps and file size. 2) Most management actions can be withdrawn without any data lose. And 3) Data removing are usually double/triple checked before being committed. It is reasonable to ask administrator to double check the sizes of removing files on Lustre client if file size on MDT is not precise all the time.

      Still, we have a lot of choices about how to implement file size on MDT, even we choose to imlement a relax/lazy version. I believe that a lot of related work in the history could be reused . I guess using a new extended attribute for file size on MDT might be better than using the i_size in inode structure, since data on MDT is coming. And file size on MDT should be synced in a couple of scenarios which provides enough consistency guarantees yet at the same time introduces little performance impact, for example 1) when the last file close finishes, and 2) when a significant time has been past since last sync

      I'd like to work on this when this is fully discussed and a design is agreed by all people involved. Any advice would be appreciated. Thanks!

      Attachments

        Issue Links

          Activity

            [LU-9538] Size on MDT with guarantee of eventual consistency

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33565/
            Subject: LU-9538 utils: update description of ldiskfs xattrs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 533a253ad86cd6bb09c3889110312ef375e9590d

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33565/ Subject: LU-9538 utils: update description of ldiskfs xattrs Project: fs/lustre-release Branch: master Current Patch Set: Commit: 533a253ad86cd6bb09c3889110312ef375e9590d

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33565
            Subject: LU-9538 utils: update description of ldiskfs xattrs
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: b889b2caf3f791083a3785c3c60eb9b78127eca5

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33565 Subject: LU-9538 utils: update description of ldiskfs xattrs Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: b889b2caf3f791083a3785c3c60eb9b78127eca5
            adilger Andreas Dilger added a comment - - edited

            John, the llsom_sync tool, if running on the MDSes, will monitor the Changelog and update the LSOM data by default 10 minutes after the file was modified. It also aggregates updates so that multiple file modifications in the prior 10 minutes do not result in multiple LSOM updates (it is set to the most current size/blocks value).

            If the llsom_sync tool is not running, then the majority of new files will still have the LSOM data updated at close, except when there are strange file write orderings (e.g. many clients doing write/truncate/etc.), or the clients crash before they close the file. That update typically happens as soon as the client closes the file on the MDS.

            Files will also have their LSOM data updated to the current size/blocks when opened and closed by any client (if it has changed), so it is naturally correcting itself over time. That is all the llsom_sync tool is doing in the end - open and close the file after (presumably) it has stopped being modified. If it is still being modified, or is modified again later, there will be another Changelog record written, and llsom_sync will open/close the file another time.

            adilger Andreas Dilger added a comment - - edited John, the llsom_sync tool, if running on the MDSes, will monitor the Changelog and update the LSOM data by default 10 minutes after the file was modified. It also aggregates updates so that multiple file modifications in the prior 10 minutes do not result in multiple LSOM updates (it is set to the most current size/blocks value). If the llsom_sync tool is not running, then the majority of new files will still have the LSOM data updated at close, except when there are strange file write orderings (e.g. many clients doing write/truncate/etc.), or the clients crash before they close the file. That update typically happens as soon as the client closes the file on the MDS. Files will also have their LSOM data updated to the current size/blocks when opened and closed by any client (if it has changed), so it is naturally correcting itself over time. That is all the llsom_sync tool is doing in the end - open and close the file after (presumably) it has stopped being modified. If it is still being modified, or is modified again later, there will be another Changelog record written, and llsom_sync will open/close the file another time.

            "when a significant time has been past since last sync" . Was this value defined?  Is there a config variable?

            johnbent John Bent (Inactive) added a comment - "when a significant time has been past since last sync" . Was this value defined?  Is there a config variable?
            pjones Peter Jones added a comment -

            Landed for 2.12

            pjones Peter Jones added a comment - Landed for 2.12

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/30124/
            Subject: LU-9538 utils: Tool for syncing file LSOM xattr
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: caba6b9af07567ff4cdae9f6450f399cd3ca445e

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/30124/ Subject: LU-9538 utils: Tool for syncing file LSOM xattr Project: fs/lustre-release Branch: master Current Patch Set: Commit: caba6b9af07567ff4cdae9f6450f399cd3ca445e

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32918/
            Subject: LU-9538 utils: fix lfs xattr.h header usage
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: cc234da91b6c00cbe681d7352320df94c09dc288

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32918/ Subject: LU-9538 utils: fix lfs xattr.h header usage Project: fs/lustre-release Branch: master Current Patch Set: Commit: cc234da91b6c00cbe681d7352320df94c09dc288

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32918
            Subject: LU-9538 utils: fix lfs xattr.h header usage
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 59921f66904c17b77a69f9bb4bc0b0d8676d32f4

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32918 Subject: LU-9538 utils: fix lfs xattr.h header usage Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 59921f66904c17b77a69f9bb4bc0b0d8676d32f4

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/29960/
            Subject: LU-9538 mdt: Lazy size on MDT
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: f1ebf88aef2101ff9ee30b0ddea107e8f700c07f

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/29960/ Subject: LU-9538 mdt: Lazy size on MDT Project: fs/lustre-release Branch: master Current Patch Set: Commit: f1ebf88aef2101ff9ee30b0ddea107e8f700c07f

            For the 2.12 release, it would be great if lfs find could be enhanced to use the LSOM data from the MDS when checking -size or blocks. Maybe a lfs find --lazy option could be added to determine if the LSOM data is used or not. At first, this could use lgetxattr("trusted.som") interface to get the LSOM attr, but eventually this should be converted to use the statx(AT_STATX_DONT_SYNC) interface on the client. That is an internal implementation detail that the user should not care about when using -lazy and can be done at some later time.

            Ideally, the use of lgetxattr() would avoid sending an extra RPC to the MDS to fetch the lazy size, but this is not going to be worse than fetching the size from the OSS nodes, as it would only involve a single MDS_GETXATTR RPC (and may already be prefetched to the client).

            adilger Andreas Dilger added a comment - For the 2.12 release, it would be great if lfs find could be enhanced to use the LSOM data from the MDS when checking - size or blocks . Maybe a lfs find --lazy option could be added to determine if the LSOM data is used or not. At first, this could use lgetxattr("trusted.som") interface to get the LSOM attr, but eventually this should be converted to use the statx(AT_STATX_DONT_SYNC) interface on the client. That is an internal implementation detail that the user should not care about when using -lazy and can be done at some later time. Ideally, the use of lgetxattr() would avoid sending an extra RPC to the MDS to fetch the lazy size, but this is not going to be worse than fetching the size from the OSS nodes, as it would only involve a single MDS_GETXATTR RPC (and may already be prefetched to the client).

            People

              lixi Li Xi (Inactive)
              lixi Li Xi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              26 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: