[LU-14461] Convert LSOM with loose size consistency into a strong consistent version Created: 22/Feb/21  Updated: 09/Mar/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Qian Yingjin Assignee: Qian Yingjin
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates LU-11994 Add support for LSOM in LFSCK Open
duplicates LU-11962 File LSOM updates to store proper siz... Open
Related
Rank (Obsolete): 9223372036854775807

 Description   

Originally SOM (Size On MDT) feature wants to achieve a strict size management for Lustre via the concept like I/O epoch. It means that when the file is opened for writing (I/O epoch starts), the size on MDT is not strict correct in the write I/O epoch and strict size should be obtained from Lustre OSTs. The size will change into a strict correct version when the last writer close the file (I/O epoch ends).
However, evict or crash of the client in a cluster might cause incomplete process of closing a file, but the data stripping objects on OST have been updated, thus result in inaccurate SOM.
Correct the size on last close is too complex to implement due to the problem of how to solve the size consistency in the failure and recovery case. Thus, we remove the unfinished work for SOM from Lustre. And implement the LSOM with loose consistency.

LSOM can be converted into a strict correct version especially when there is no any I/O activity on the file. Thus it would better to transmit LSOM with loose consistency into a strict strong consistency version.
Originally combination with LFSCK to correct file size stored on LSOM EA is proposed, but LFSCK can not use the CLIO engine and there is not very easy to obtain the file size from MDT.

A new suggestion is proposed as follows:

  • Each time open the file with O_RDWR or O_WRONLY mode for write, MDT adds a changelog record for the writer.
  • There is a dedicated Lustre client which is used to scan and consume this kind of Lustre changelog records. For each such record, the client opens the file with lease intent lock. A successful lease open ensures that it is unique opener for this file in the cluster wide. And then the client do stat on the file, update the LSOM and change it into a strong consistency version during the lease close.
  • When MDT receives an open request for write, if the LSOM is a LAZY version, do nothing; if the LSOM data is marked with STRICT correct (strong consistency), the MDT will change it into a loose consistent version. Or a better implementation can delay this change until the first write to this file object just like FLR transmits from read-only state to write pending state.
  • When a client do stat() on the file, if LSOM data is STRICT correct, can be directly using without any glimpse lock RPC to Lustre OSTs.

Any suggestion is welcome!



 Comments   
Comment by Andreas Dilger [ 22/Feb/21 ]

I don't think it is good to have Changelog records for tracking SOM, as that requires some userspace process to consume the records, and if they are not processed for a long time it can cause a lot of problems on the system.

See LU-11962 for a proposal on how to do this conversion "lazily" on file access. There is no need for all files to have proper SOM, it is enough if most files (files that are actually accessed) have SOM. Files that are never accessed after creation do not need SOM. If the file is accessed some time after creation (e.g. 24h, but tunable), and the size/blocks has not changed since the previous LSOM data, then it would be possible to mark the file with strict SOM, but it then requires an extra RPC + MDT write to be modified afterward (the same as any FLR file, and should use the same mechanism instead of creating a new one).

Generated at Sat Feb 10 03:09:58 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.