Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-19131

DIO read/write can livelock on swapped pages in get_user_pages()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0, Lustre 2.16.1
    • None
    • 3
    • 9223372036854775807

    Description

      Running "lfs migrate" or "lfs mirror resync" on a file that is much larger than RAM can cause the thread to livelock in the kernel under memory pressure, causing "lfs" to spin with 100% CPU usage trying to access buffer pages to do the migrate/mirror copy:

      [<0>] do_swap_page+0xaf/0x790
      [<0>] __handle_mm_fault+0x552/0x6d0
      [<0>] handle_mm_fault+0xca/0x2a0
      [<0>] __get_user_pages+0x250/0x830
      [<0>] get_user_pages_unlocked+0xd5/0x2a0
      [<0>] internal_get_user_pages_fast+0x193/0x2c0
      [<0>] iov_iter_get_pages_alloc+0x110/0x4c0
      [<0>] ll_direct_IO_impl+0x30f/0xc50 [lustre]
      [<0>] generic_file_read_iter+0x8f/0x150
      [<0>] vvp_io_read_start+0x597/0x840 [lustre]
      [<0>] cl_io_start+0x5d/0x110 [obdclass]
      [<0>] cl_io_loop+0x9a/0x200 [obdclass]
      [<0>] ll_file_io_generic+0xa83/0xf90 [lustre]
      [<0>] ll_file_read_iter+0x9de/0xd20 [lustre]
      [<0>] new_sync_read+0x10f/0x160
      [<0>] vfs_read+0x91/0x150
      [<0>] ksys_pread64+0x65/0xa0
      [<0>] do_syscall_64+0x5b/0x1a0
      

      This can happen if run on the OSS where an object being read or written is loading pages into cache, or if there are another process(es) (e.g. "lfs mirror extend" calling mirror_extend_file() that does not open files with O_DIRECT) that are reading into the client page cache.

      It appears that the buffer pages used by migrate_copy_data() for both migrate and resync get swapped out under pressure and cannot be faulted in by the kernel.

      This was easily and repeatedly reproduced on a client-on-OSS node running el8.10 4.18.0-553.50.1 kernel with 4GB RAM migrating a 30GB file, but also on a standalone client with 128GB RAM running 40 copies of "lfs mirror extend" (buffered IO) and "lfs mirror resync" (direct IO) on separate files of course, with some of the files over 2TB.

      Attachments

        Issue Links

          Activity

            People

              adilger Andreas Dilger
              adilger Andreas Dilger
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: