[LU-15252] option to disable LSOM updates Created: 19/Nov/21  Updated: 26/May/23  Resolved: 23/Dec/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: Lustre 2.15.0

Type: Improvement Priority: Major
Reporter: Alexander Boyko Assignee: Alexander Boyko
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Duplicate
Rank (Obsolete): 9223372036854775807

 Description   

When LSOM is not used at cluster it is better to disable it, I don't see such option for now.
During analyze of MDS vmocre with high load avarage, we have found LSOM feature add big impact to it.
205 threads were blocked with mdt_lsom_update(). A few threads got further and were waiting for the osd lock for read. It seems that mdt_lsom_update() has a serious issue with a single shared file because of its mdt-level mutex for every close request.



 Comments   
Comment by Gerrit Updater [ 19/Nov/21 ]

"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45619
Subject: LU-15252 mdc: add client tunable to disable LSOM update
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: e10a14aa558f7043ef91f6aae6326d6694c1cacc

Comment by Andreas Dilger [ 19/Nov/21 ]

It would be better to fix the reason why LSOM updates are slow. From the comments, this is due to lock contention on the MDS, but there should be a way to avoid it, since this is "lazy" since and does not have to be totally accurate all the time. Fro example, checking if the incoming LSOM size/blocks is already smaller than current size/blocks without the lock, since LSOM should only be increasing.

Also, it may be possible to batch LSOM updates in memory for a few seconds as long as the open counter > 0, since we know/expect some later close will write it to disk, and the update does not need to be part of a transaction if there is no other reason for it.

Similarly, it should be possible to cache flag on the mdt_object if there is no LOV EA or DoM is ised, since this changes very rarely, so reading the LOV EA just for these two bits of information is expensive. That would allow checking whether an LSOM update is needed without the object mutex.

Comment by James A Simmons [ 19/Nov/21 ]

 I agree with Andreas. LSOM is too critical to disable for us and we don't want our MDS servers bogged down at the same time.

Comment by Alexander Boyko [ 22/Nov/21 ]

Andreas, I've made a quick fix for Lustre clients only. But,I agree with you, LSOM requires fixes on the server side to improve single shared file perfomance. By default LSOM is enabled still, so it has no any impact by default.

Comment by Alexander Boyko [ 22/Nov/21 ]

>Similarly, it should be possible to cache flag on the mdt_object if there is no LOV EA or DoM is ised, since this changes very rarely, so reading the LOV EA just for these two bits of information is expensive. That would allow checking whether an LSOM update is needed without the object mutex.
Maybe it is better to store LSOM at mdt object and update xattr only for a last close? With this case only failover would affect lazy size to be wrong at xattr, but I think it is normal for LSOM.
Andreas, any objection?

Comment by Andreas Dilger [ 23/Nov/21 ]

aboyko, I think the LSOM update can be fairly lazy, and there isn't a serious danger if some updates are lost, but there should still be occasional writes of new LSOM data to disk:

  • if the inode is being written already for some other reason (e.g. atime update, link count, etc.)
  • after some long time since the last LSOM update (e.g. 60s, like atime_diff). maybe LSOM and atime writes will happen at the same time?
  • when the last client closes the file, including when the client is evicted

My main concern would be that files which are never properly closed will also never get LSOM updates. That could happen with config files or shared libraries for jobs that run a very long time, and/or files that are being accessed by different jobs and always have some client process holding them open.

Comment by Gerrit Updater [ 02/Dec/21 ]

"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45709
Subject: LU-15252 mdt: reduce contention at mdt_lsom_update
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 495a7eb370cf9dbb5eec67bbd0a59ae206cdb68a

Comment by Gerrit Updater [ 13/Dec/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45619/
Subject: LU-15252 mdc: add client tunable to disable LSOM update
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 19172ed37851fdd5731b1319c12151f5cb1fe267

Comment by Peter Jones [ 13/Dec/21 ]

Landed for 2.15

Comment by Andreas Dilger [ 13/Dec/21 ]

Still a second patch in flight that fixes the performance issue instead of working around it.

Comment by Gerrit Updater [ 23/Dec/21 ]

"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45709/
Subject: LU-15252 mdt: reduce contention at mdt_lsom_update
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: c8b7afe4970415f8dae84f5e20661f8a3b3681a0

Generated at Sat Feb 10 03:16:44 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.