[LU-15252] option to disable LSOM updates Created: 19/Nov/21 Updated: 26/May/23 Resolved: 23/Dec/21 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.15.0 |
| Fix Version/s: | Lustre 2.15.0 |
| Type: | Improvement | Priority: | Major |
| Reporter: | Alexander Boyko | Assignee: | Alexander Boyko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Issue Links: |
|
||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
When LSOM is not used at cluster it is better to disable it, I don't see such option for now. |
| Comments |
| Comment by Gerrit Updater [ 19/Nov/21 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45619 |
| Comment by Andreas Dilger [ 19/Nov/21 ] |
|
It would be better to fix the reason why LSOM updates are slow. From the comments, this is due to lock contention on the MDS, but there should be a way to avoid it, since this is "lazy" since and does not have to be totally accurate all the time. Fro example, checking if the incoming LSOM size/blocks is already smaller than current size/blocks without the lock, since LSOM should only be increasing. Also, it may be possible to batch LSOM updates in memory for a few seconds as long as the open counter > 0, since we know/expect some later close will write it to disk, and the update does not need to be part of a transaction if there is no other reason for it. Similarly, it should be possible to cache flag on the mdt_object if there is no LOV EA or DoM is ised, since this changes very rarely, so reading the LOV EA just for these two bits of information is expensive. That would allow checking whether an LSOM update is needed without the object mutex. |
| Comment by James A Simmons [ 19/Nov/21 ] |
|
I agree with Andreas. LSOM is too critical to disable for us and we don't want our MDS servers bogged down at the same time. |
| Comment by Alexander Boyko [ 22/Nov/21 ] |
|
Andreas, I've made a quick fix for Lustre clients only. But,I agree with you, LSOM requires fixes on the server side to improve single shared file perfomance. By default LSOM is enabled still, so it has no any impact by default. |
| Comment by Alexander Boyko [ 22/Nov/21 ] |
|
>Similarly, it should be possible to cache flag on the mdt_object if there is no LOV EA or DoM is ised, since this changes very rarely, so reading the LOV EA just for these two bits of information is expensive. That would allow checking whether an LSOM update is needed without the object mutex. |
| Comment by Andreas Dilger [ 23/Nov/21 ] |
|
aboyko, I think the LSOM update can be fairly lazy, and there isn't a serious danger if some updates are lost, but there should still be occasional writes of new LSOM data to disk:
My main concern would be that files which are never properly closed will also never get LSOM updates. That could happen with config files or shared libraries for jobs that run a very long time, and/or files that are being accessed by different jobs and always have some client process holding them open. |
| Comment by Gerrit Updater [ 02/Dec/21 ] |
|
"Alexander Boyko <alexander.boyko@hpe.com>" uploaded a new patch: https://review.whamcloud.com/45709 |
| Comment by Gerrit Updater [ 13/Dec/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45619/ |
| Comment by Peter Jones [ 13/Dec/21 ] |
|
Landed for 2.15 |
| Comment by Andreas Dilger [ 13/Dec/21 ] |
|
Still a second patch in flight that fixes the performance issue instead of working around it. |
| Comment by Gerrit Updater [ 23/Dec/21 ] |
|
"Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45709/ |