[LU-16611] default read_cache_enable=0 not always the best choice when rotational=0 Created: 02/Mar/23  Updated: 08/Mar/23

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.9, Lustre 2.15.2
Fix Version/s: None

Type: Improvement Priority: Minor
Reporter: Stephane Thiell Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None
Environment:

osd-ldiskfs


Rank (Obsolete): 9223372036854775807

 Description   

read_cache_enable is disabled by default if queue/rotational=0

I wanted to raise awareness that this might not be the best default value. I recently noticed that, at least on recent hardware (eg. AMD EPYC Milan) with SAS-based SSDs (not NVMe), it is more efficient to keep read_cache_enable enabled to improve the read performance with many fio benchmarks I ran (sequential I/Os and random I/Os), at least the ones that could benefit from OSS caching (I was careful to not benchmark the client cache...).

This is at least my experience with SAS SSDs, but I understand it might be better to keep the OSS read cache disabled with NVMe drives when accessed directly.



 Comments   
Comment by Andreas Dilger [ 03/Mar/23 ]

Stephane, I believe that this parameter is careful not to override any value set by the config, so it should be possible to change it in the normal manner (ie "lctl set_param -P".

Also, since this only affects the in-memory cache, there is no problem if it is not set in the "ideal" manner for a second or two after mount while the OSS is applying the config parameters from the MGS.

Since we have very little information available to the filesystem (it was a challenge even getting "rotational" tone set consistently for the storage), I don't think it is practical for Lustre to auto-tune this based on the specific storage type.

Would this be best handled by describing the various options in the Lustre Operations Manual?

Comment by Stephane Thiell [ 07/Mar/23 ]

Andreas, thanks for your reply, that makes sense. With servers' memory performance increasing and probably the Linux page cache's performance also improving (not sure of that), it might be worth adding a note in the manual.

Today the Lustre manual says:

By default, read cache is enabled (read_cache_enable=1) for HDD OSDs and automatically disabled for flash OSDs (nonrotational=1). 

Perhaps adding something like that would be a good general guidance:

However, depending on the hardware (performance of flash storage, type of server), it might be still beneficial to enable read_cache_enable with flash OSD. This can be done permanently with lctl set_param -P ....

Comment by Andreas Dilger [ 08/Mar/23 ]

Patches welcome You can also use the LU ticket to submit a patch to the manual, no need for a separate LUDOC ticket.

Generated at Sat Feb 10 03:28:30 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.