[LU-218] deadlock (pagefault vs. blocking ast) in clio Created: 18/Apr/11  Updated: 18/Apr/11  Resolved: 18/Apr/11

Status: Closed
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.1.0
Fix Version/s: Lustre 2.1.0

Type: Bug Priority: Major
Reporter: Niu Yawei (Inactive) Assignee: Robert Read (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Environment:

2.6.18-194.17.1.el5


Attachments: Text File deadlock-trace.log    
Severity: 3
Epic: client
Rank (Obsolete): 10397

 Description   

The following deadlock was hit when I was running mmap tests:

mmap test thread: pagefault -> lock page -> release dlm lock -> cancel dlm lock -> cl_lock_mutex_get;
bl_ast handler thread: cancel dlm lock -> cl_lock_mutex_get -> flush pages -> lock page;

And because of this deadlock, ll_imp_inval thread is blocked on cl_lock_mutex_get, so the client eviction can never be finished. I think it's the root cause of LU-180, but I'm not 100 percent sure because they didn't provide the stack trace to prove it yet.

The stack trace is attached.



 Comments   
Comment by Niu Yawei (Inactive) [ 18/Apr/11 ]

Jay, could you take a look to see if it's a known bug or not? Thanks.

Comment by Oleg Drokin [ 18/Apr/11 ]

I wonder if this is the same thing as LU-122?

Comment by Niu Yawei (Inactive) [ 18/Apr/11 ]

Right, I think it's same as LU-122, and LU-180 is also probably caused by this deadlock. Will mark it as duplicated bug.

Comment by Niu Yawei (Inactive) [ 18/Apr/11 ]

Duplicated with LU-122.

Generated at Sat Feb 10 01:04:55 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.