Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-219

readahead may cause OOM

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Minor
    • None
    • Lustre 2.1.0
    • None
    • $ uname -r
      2.6.18-194.17.1.el5
      $ cat /proc/fs/lustre/version
      lustre: 2.0.59
      kernel: patchless_client
      build: jenkins-g3dcb5fb-PRISTINE-2.6.18-194.17.1.el5
    • 3
    • 10134

    Description

      We are seeing OOMs from readahead. There appear to be several issues:

      1) Lustre readahead is insensitive to memory pressure. It would be nice to have something like max_sane_readahead().

      2) Lustre readahead calls grab_cache_page_nowait() which allocates using the GFP mask of the file. So to allocate a cache page for readahead the GFP mask is HARDWALL|WAIT|IO|HIGHMEM, which is sufficient to trigger an OOM.

      3) In 2.6.18-194.17.1.el5, grab_cache_page_nowait() also calls add_to_page_cache_lru() with mask GFP_KERNEL, also enough to cause an OOM, or to recurse into the filesystem. (In el6, GFP_KERNEL is changed to GFP_NOFS.)

      It's easily reproduced by getting available memory under max_read_ahead_mb, and issuing a suitable read(). Under that reproducer, the OOM can be prevented by clearing __GFP_WAIT in grab_cache_page_nowait() and add_to_page_cache_lru(). I do not of a fix that does not modify the kernel.

      See attached for the console logs from a client on a llmount.sh filesystem. But note that this issue is also frequently observed in production on TACC Lonestar (2.6.18-192.32.1/1.8.5).

      Attachments

        Activity

          People

            wc-triage WC Triage
            jhammond John Hammond
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: