Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.6.0
-
3
-
14760
Description
After adding the patch for LU-5108 to Cray's tree, we continued to see mmap_sem deadlocks.
We believe we've identified the reason:
vvp: Clear vio->u.fault.fault.ft_flags in ll_fault0
In ll_fault0, the 'fault' struct is mostly cleared before the call to
cl_io_loop, but ft_flags is not reset. It is ordinarily set by
the call to filemap_fault in vvp_io_kernel_fault, but if Lustre returns
before calling filemap_fault, it still has the old value of ft_flags.
ll_fault0 will then consume the ft_flags field. If it has the
VM_FAULT_RETRY bit set, it will be used as ll_fault0() and
ll_fault()'s return value.
This is a problem when VM_FAULT_RETRY is in ft_flags:
When fault/filemap_fault return with that flag set, they have already
released the mmap semaphore, and do_page_fault does not need to release it.
Incorrectly returning this flag from ll_fault means the mmap semaphore
is not upped in the kernel's do_page_fault().
Sample debug output... This is the VM_FAULT_RETRY flag returned with EAGAIN, which is fine and expected, but then the same flags are returned with ERESTARTSYS [which came from earlier than filemap_fault], and so the semaphore is not upped in do_page_fault:
00000080:00800000:24.0:1404292452.550624:0:23550:0:(llite_mmap.c:341:ll_fault0()) fsx-linux-aio fault 1028/-11
00000080:00800000:24.0:1404292452.554058:0:23550:0:(llite_mmap.c:341:ll_fault0()) fsx-linux-aio fault 1028/-512
Credit, as for LU-5221, to Cray's Paul Cassella.
Attachments
Issue Links
- is related to
-
LU-5108 osc: Performance tune for LRU
- Resolved