I belive that size on MDT has been discussed for a long time, and there were even some implementations of it before. I am creating this ticket to discuss it again, because keeping file size on MDTs seems very important for the new policy engine of Lustre (LiPE) that I am currently work on.
LiPE scans MDTs directly and extracts almost all the file attributes. Values of a series of mathematical expressions are calculated by LiPE according to these attribute values. And the expression values determine which rules the corresponding file matches with. This works perfectly for almost all metadata of files, except the file sizes, because MDT doesn't keep file sizes. That is the reason why we want to add file size on MDT.
Given the fact that file size on MDT has been discussed for a long time, I believe a lot of problems/difficulties of implementing this feature has been recognized by people in Lustre community. And I think is obvious that implementing a strict size on MDT with strong guarantees is too hard.
For LiPE, I think file sizes with guarantees of eventual consistency should be enough for most use cases. Because 1) smart administrators will leave enough margin of data management. I don't think smart administrator will define any dangerous rule based on the strict file size without enough margins of timestamps and file size. 2) Most management actions can be withdrawn without any data lose. And 3) Data removing are usually double/triple checked before being committed. It is reasonable to ask administrator to double check the sizes of removing files on Lustre client if file size on MDT is not precise all the time.
Still, we have a lot of choices about how to implement file size on MDT, even we choose to imlement a relax/lazy version. I believe that a lot of related work in the history could be reused . I guess using a new extended attribute for file size on MDT might be better than using the i_size in inode structure, since data on MDT is coming. And file size on MDT should be synced in a couple of scenarios which provides enough consistency guarantees yet at the same time introduces little performance impact, for example 1) when the last file close finishes, and 2) when a significant time has been past since last sync
I'd like to work on this when this is fully discussed and a design is agreed by all people involved. Any advice would be appreciated. Thanks!