[LU-16935] deadlock between ll_filemap_fault and ll_imp_inval Created: 29/Jun/23 Updated: 29/Jun/23 |
|
| Status: | Open |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Vladimir Saveliev | Assignee: | Vladimir Saveliev |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Rank (Obsolete): | 9223372036854775807 |
| Description |
|
Thee following loop in ll_filemap_fault int ll_filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf) .. do { seq = read_seqbegin(&ll_i2info(inode)->lli_page_inv_lock); ret = __ll_filemap_fault(vma, vmf); } while (read_seqretry(&ll_i2info(inode)->lli_page_inv_lock, seq) && (ret & VM_FAULT_SIGBUS)); may become endless:
ll_filemap_fault()
filemap_fault()
...
ll_readpage()
ll_io_read_page()
rc = cl_sync_io_wait(env, anchor, 0);
if (!PageUptodate(cl_page_vmpage(page)))
cl_page_discard()
vvp_page_discard()
generic_error_remove_page()
truncate_complete_page()
...
vvp_page_delete()
write_seqlock(&ll_i2info(inode)->lli_page_inv_lock);
ClearPageUptodate(vmpage);
write_sequnlock(&ll_i2info(inode)->lli_page_inv_lock);
If page is not uptodate after cl_sync_io_wait() - vvp_page_delete() called deep inside cl_page_discard() increases lli_page_inv_lock seqlock. filemap_fault() (true for 4.12.14_122.147 and probably few other kernels of SLES12 SP5) returns VM_FAULT_SIGBUS if readpage fails: int filemap_fault(struct vm_fault *vmf) .. error = mapping->a_ops->readpage(file, page); if (!error) { wait_on_page_locked(page); if (!PageUptodate(page)) error = -EIO; } put_page(page); if (!error || error == AOP_TRUNCATED_PAGE) goto retry_find; /* Things didn't work out. Return zero to tell the mm layer so. */ shrink_readahead_size_eio(file, ra); return VM_FAULT_SIGBUS; When readpage fails as result of eviction from server side the following deadlock gets formed: ll_imp_inval stucks in ptlrpc_invalidate_import_thread obd_import_event(IMP_EVENT_INVALIDATE) .. osc_object_invalidate l_wait_event(osc->oo_io_waitq, atomic_read(&osc->oo_nr_ios) == 0, &lwi); and can not proceed to recovery. ll_filemap_fault() spins and keeps osc->oo_nr_ios != 0, as readpage()'s read rpc fails with -108 because import is invalid: static int ptlrpc_import_delay_req(struct obd_import *imp, .. } else if (imp->imp_invalid || imp->imp_obd->obd_no_recov) { if (!imp->imp_deactive) DEBUG_REQ(D_NET, req, "IMP_INVALID"); *status = -ESHUTDOWN; /* b=12940 */ |
| Comments |
| Comment by Gerrit Updater [ 29/Jun/23 ] |
|
"Vladimir Saveliev <vladimir.saveliev@hpe.com>" uploaded a new patch: https://review.whamcloud.com/c/fs/lustre-release/+/51505 |