Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5291

Failure to clear ft_flags leads to mmap_sem deadlocks

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.7.0
    • Lustre 2.6.0
    • 3
    • 14760

    Description

      After adding the patch for LU-5108 to Cray's tree, we continued to see mmap_sem deadlocks.

      We believe we've identified the reason:
      vvp: Clear vio->u.fault.fault.ft_flags in ll_fault0

      In ll_fault0, the 'fault' struct is mostly cleared before the call to
      cl_io_loop, but ft_flags is not reset. It is ordinarily set by
      the call to filemap_fault in vvp_io_kernel_fault, but if Lustre returns
      before calling filemap_fault, it still has the old value of ft_flags.

      ll_fault0 will then consume the ft_flags field. If it has the
      VM_FAULT_RETRY bit set, it will be used as ll_fault0() and
      ll_fault()'s return value.

      This is a problem when VM_FAULT_RETRY is in ft_flags:
      When fault/filemap_fault return with that flag set, they have already
      released the mmap semaphore, and do_page_fault does not need to release it.

      Incorrectly returning this flag from ll_fault means the mmap semaphore
      is not upped in the kernel's do_page_fault().

      Sample debug output... This is the VM_FAULT_RETRY flag returned with EAGAIN, which is fine and expected, but then the same flags are returned with ERESTARTSYS [which came from earlier than filemap_fault], and so the semaphore is not upped in do_page_fault:
      00000080:00800000:24.0:1404292452.550624:0:23550:0:(llite_mmap.c:341:ll_fault0()) fsx-linux-aio fault 1028/-11
      00000080:00800000:24.0:1404292452.554058:0:23550:0:(llite_mmap.c:341:ll_fault0()) fsx-linux-aio fault 1028/-512

      Credit, as for LU-5221, to Cray's Paul Cassella.

      Attachments

        Issue Links

          Activity

            People

              jay Jinshan Xiong (Inactive)
              paf Patrick Farrell
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: