[LU-2487] 2.2 Client deadlock between ll_md_blocking_ast, sys_close, and sys_open Created: 13/Dec/12 Updated: 03/Oct/13 Resolved: 02/Jan/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.2.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor |
| Reporter: | Artem Blagodarenko (Inactive) | Assignee: | Andreas Dilger |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | client, patch | ||
| Issue Links: |
|
||||||||
| Severity: | 2 | ||||||||
| Rank (Obsolete): | 5837 | ||||||||
| Description |
|
Spinlock Usage Walking through the code and digging through pointers in the stack frames of the 3 threads leads to the following 3 suspect structures and their related spinlocks: inode = 0xffff880581a26638 (i_lock) Looks like there is a 3 way deadlock between ll_md_blocking_ast, sys_open, and sys_close using the above spinlocks. CPU 10: ll_md_blocking_ast()
CPU 7: sys_close()
CPU 15: sys_open
The lr_lock, d_lock, and i_lock are the same in all cases. So ll_md_blocking_ast() waits for sys_open(), sys_open waits for sys_close(), and sys_close() waits for ll_md_blocking_ast. A 3-way deadlock. This deadlock is possible because of two pathces: |
| Comments |
| Comment by Artem Blagodarenko (Inactive) [ 13/Dec/12 ] |
|
I set minor priority because race is possible only with patch from |
| Comment by Artem Blagodarenko (Inactive) [ 14/Dec/12 ] |
|
Xyratex MRP-675 |
| Comment by Artem Blagodarenko (Inactive) [ 14/Dec/12 ] |
| Comment by Andreas Dilger [ 21/Dec/12 ] |
|
Which client kernel is this? The new dcache_lock removal is only in use for kernels > 2.6.37, so I guess this is SLES11 SP2 (3.0)? I'm trying to see where there is a deadlock in the code, because your original comment is not showing the callpath to the function getting the second lock. I'm guessing something like: CPU 10: ll_md_blocking_ast() + ll_inode_from_resource()/ll_inode_from_lock() gets lock->l_lock and lock->l_resource->lr_lock - igrab() wants lock->l_resource->lr_lvb_inode->i_lock CPU 7: sys_close() + dput() gets dentry->d_lock - ll_ddelete->find_cbdata->ldlm_resource_foreach() wants res->lr_lock CPU 15: sys_open + ll_splice_alias()/ll_find_alias() gets inode->i_lock - ll_splice_alias()/ll_find_alias() wants dentry->d_lock The ll_ddelete->find_cbdata() path has been disabled in 9f3469f1: /* Disable this piece of code temproarily because this is called * inside dcache_lock so it's not appropriate to do lots of work * here. */ #if 0 /* if not ldlm lock for this inode, set i_nlink to 0 so that * this inode can be recycled later b=20433 */ if (de->d_inode && !find_cbdata(de->d_inode)) clear_nlink(de->d_inode); #endif So it seems there is no need for this complex patch? |
| Comment by Artem Blagodarenko (Inactive) [ 29/Dec/12 ] |
|
Yes, we do not need this patch, because one of deadlock's branches is disabled in current master. But I have added comment " |
| Comment by Artem Blagodarenko (Inactive) [ 29/Dec/12 ] |
|
Can we close this issue with "wan't fix"? |
| Comment by Andreas Dilger [ 02/Jan/13 ] |
|
I think the comment you added in the I'm going to close this as Cannot Reproduce, since the problem never existed in any of the public Lustre releases. |