[LU-5291] Failure to clear ft_flags leads to mmap_sem deadlocks Created: 03/Jul/14  Updated: 13/Jul/15  Resolved: 05/Dec/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.6.0
Fix Version/s: Lustre 2.7.0

Type: Bug Priority: Minor
Reporter: Patrick Farrell (Inactive) Assignee: Jinshan Xiong (Inactive)
Resolution: Fixed Votes: 0
Labels: patch

Issue Links:
Related
is related to LU-5108 osc: Performance tune for LRU Resolved
Severity: 3
Rank (Obsolete): 14760

 Description   

After adding the patch for LU-5108 to Cray's tree, we continued to see mmap_sem deadlocks.

We believe we've identified the reason:
vvp: Clear vio->u.fault.fault.ft_flags in ll_fault0

In ll_fault0, the 'fault' struct is mostly cleared before the call to
cl_io_loop, but ft_flags is not reset. It is ordinarily set by
the call to filemap_fault in vvp_io_kernel_fault, but if Lustre returns
before calling filemap_fault, it still has the old value of ft_flags.

ll_fault0 will then consume the ft_flags field. If it has the
VM_FAULT_RETRY bit set, it will be used as ll_fault0() and
ll_fault()'s return value.

This is a problem when VM_FAULT_RETRY is in ft_flags:
When fault/filemap_fault return with that flag set, they have already
released the mmap semaphore, and do_page_fault does not need to release it.

Incorrectly returning this flag from ll_fault means the mmap semaphore
is not upped in the kernel's do_page_fault().

Sample debug output... This is the VM_FAULT_RETRY flag returned with EAGAIN, which is fine and expected, but then the same flags are returned with ERESTARTSYS [which came from earlier than filemap_fault], and so the semaphore is not upped in do_page_fault:
00000080:00800000:24.0:1404292452.550624:0:23550:0:(llite_mmap.c:341:ll_fault0()) fsx-linux-aio fault 1028/-11
00000080:00800000:24.0:1404292452.554058:0:23550:0:(llite_mmap.c:341:ll_fault0()) fsx-linux-aio fault 1028/-512

Credit, as for LU-5221, to Cray's Paul Cassella.



 Comments   
Comment by Patrick Farrell (Inactive) [ 03/Jul/14 ]

http://review.whamcloud.com/10956

Comment by Patrick Farrell (Inactive) [ 07/Aug/14 ]

Patch is landed, this should be closed.

Comment by Jodi Levi (Inactive) [ 05/Dec/14 ]

Patch landed to Master.

Generated at Sat Feb 10 01:50:12 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.