[LU-219] readahead may cause OOM Created: 18/Apr/11  Updated: 28/Aug/12  Resolved: 28/Aug/12

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: John Hammond Assignee: WC Triage
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

$ uname -r
2.6.18-194.17.1.el5
$ cat /proc/fs/lustre/version
lustre: 2.0.59
kernel: patchless_client
build: jenkins-g3dcb5fb-PRISTINE-2.6.18-194.17.1.el5


Attachments: File rr-nc.out    
Severity: 3
Rank (Obsolete): 10134

 Description   

We are seeing OOMs from readahead. There appear to be several issues:

1) Lustre readahead is insensitive to memory pressure. It would be nice to have something like max_sane_readahead().

2) Lustre readahead calls grab_cache_page_nowait() which allocates using the GFP mask of the file. So to allocate a cache page for readahead the GFP mask is HARDWALL|WAIT|IO|HIGHMEM, which is sufficient to trigger an OOM.

3) In 2.6.18-194.17.1.el5, grab_cache_page_nowait() also calls add_to_page_cache_lru() with mask GFP_KERNEL, also enough to cause an OOM, or to recurse into the filesystem. (In el6, GFP_KERNEL is changed to GFP_NOFS.)

It's easily reproduced by getting available memory under max_read_ahead_mb, and issuing a suitable read(). Under that reproducer, the OOM can be prevented by clearing __GFP_WAIT in grab_cache_page_nowait() and add_to_page_cache_lru(). I do not of a fix that does not modify the kernel.

See attached for the console logs from a client on a llmount.sh filesystem. But note that this issue is also frequently observed in production on TACC Lonestar (2.6.18-192.32.1/1.8.5).



 Comments   
Comment by Andreas Dilger [ 19/Apr/11 ]

A long time ago, we tried to add an API called grab_cache_page_nowait_gfp() that added a "gfp" parameter to allow us to specify the exact GFP mask. However, since Lustre is not in the kernel we couldn't get this into the kernel, and patching the client.

There should already be limits for max_readahead_mb and max_readahead_per_file_mb, but it isn't clear why they wouldn't be limiting the total amount of readahead.

Comment by Oleg Drokin [ 19/Apr/11 ]

Right there is a limit of 40Mb for readahead and with that I think it's kind of hard to cause OOM.
What is the sort of load that you see to cause OOM with readahead?

Comment by John Hammond [ 20/Apr/11 ]

You're right, the readahead limits are working. We'll dial overcommit_ratio down and this should go away.

Generated at Sat Feb 10 01:04:56 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.