[LU-219] readahead may cause OOM Created: 18/Apr/11 Updated: 28/Aug/12 Resolved: 28/Aug/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | WC Triage |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
$ uname -r |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 10134 |
| Description |
|
We are seeing OOMs from readahead. There appear to be several issues: 1) Lustre readahead is insensitive to memory pressure. It would be nice to have something like max_sane_readahead(). 2) Lustre readahead calls grab_cache_page_nowait() which allocates using the GFP mask of the file. So to allocate a cache page for readahead the GFP mask is HARDWALL|WAIT|IO|HIGHMEM, which is sufficient to trigger an OOM. 3) In 2.6.18-194.17.1.el5, grab_cache_page_nowait() also calls add_to_page_cache_lru() with mask GFP_KERNEL, also enough to cause an OOM, or to recurse into the filesystem. (In el6, GFP_KERNEL is changed to GFP_NOFS.) It's easily reproduced by getting available memory under max_read_ahead_mb, and issuing a suitable read(). Under that reproducer, the OOM can be prevented by clearing __GFP_WAIT in grab_cache_page_nowait() and add_to_page_cache_lru(). I do not of a fix that does not modify the kernel. See attached for the console logs from a client on a llmount.sh filesystem. But note that this issue is also frequently observed in production on TACC Lonestar (2.6.18-192.32.1/1.8.5). |
| Comments |
| Comment by Andreas Dilger [ 19/Apr/11 ] |
|
A long time ago, we tried to add an API called grab_cache_page_nowait_gfp() that added a "gfp" parameter to allow us to specify the exact GFP mask. However, since Lustre is not in the kernel we couldn't get this into the kernel, and patching the client. There should already be limits for max_readahead_mb and max_readahead_per_file_mb, but it isn't clear why they wouldn't be limiting the total amount of readahead. |
| Comment by Oleg Drokin [ 19/Apr/11 ] |
|
Right there is a limit of 40Mb for readahead and with that I think it's kind of hard to cause OOM. |
| Comment by John Hammond [ 20/Apr/11 ] |
|
You're right, the readahead limits are working. We'll dial overcommit_ratio down and this should go away. |