[LU-122] Revert bug 21122 since it causes deadlock Created: 10/Mar/11 Updated: 28/Jun/11 Resolved: 25/Apr/11 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.0.0 |
| Fix Version/s: | Lustre 2.1.0 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Jinshan Xiong (Inactive) | Assignee: | Jinshan Xiong (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Severity: | 3 |
| Bugzilla ID: | 21,122 |
| Rank (Obsolete): | 5060 |
| Description |
|
Recently I found a deadlock issue when I was running one of my tests. After analysing the log, I realized the deadlock issue was imported by bug 21122. Then I have to rethink about the patch and try to figure out the root cause. Finally I came up with a new fix. Let me describe the deadlock a little bit(in the before patched code): Let's go back to dig the root cause of bug 21122: cl_lock_at_page(env, lock->cll_descr.cld_obj, where the last two parameters were set to 0, which means to not match the CANCELPEND locks. Unfortunately the lock A is marked to CANCELPEND because it blocks another lock. This causes the faulting page is being truncated. Then another page fault happens and the vmpage with same offset is created. This is why duplicated cl_pages were created and hit the assertion. So in the new fix, I just revert the patch of bug 21122, and change the parameters of cl_lock_at_page to (..., 1, 0). Hopefully this will save our life. Is this tricky? |
| Comments |
| Comment by Build Master (Inactive) [ 11/Mar/11 ] |
|
Integrated in Jinshan Xiong : e2d57e76eaba3a975043a3e5b9eb920e8d9cec77
|
| Comment by Build Master (Inactive) [ 12/Mar/11 ] |
|
Integrated in Jinshan Xiong : e2d57e76eaba3a975043a3e5b9eb920e8d9cec77
|
| Comment by Peter Jones [ 15/Mar/11 ] |
|
Cliff Can you please add this patch to the queue to test on Hyperion? Thanks Peter |
| Comment by Build Master (Inactive) [ 16/Mar/11 ] |
|
Integrated in Jinshan Xiong : 398fbf1a08b45a2292322a4e8396af5b623fbe31
|
| Comment by Build Master (Inactive) [ 16/Mar/11 ] |
|
Integrated in Jinshan Xiong : 026964d4ccae351e7aa5561fae976f6fe3fc2c55
|
| Comment by Cliff White (Inactive) [ 17/Mar/11 ] |
|
I will be testing builds #490 and #210 (client) on Hyperion, should be running today |
| Comment by Oleg Drokin [ 18/Mar/11 ] |
|
Would have been great if you added your findings about the bug here as well, not just in patch description in gerrit. I wonder is your issue rhel5 specific and clears in rhel6 all by itself? |
| Comment by Jinshan Xiong (Inactive) [ 18/Mar/11 ] |
|
I think the deadlock would happen to both rhel5 and rhel6. WRT the page locking, I think anyway we have to return an unlocked page in vvp_io_fault_start, otherwise it would cause deadlock. But we may make vvp_io_kernel_fault to return a locked page(we still have to do this tricky check in filemap_nopage case since it returns an unlocked page) and then unlock it in vvp_io_fault_start. It's acceptable to me if you think it will be much better. |
| Comment by Build Master (Inactive) [ 25/Mar/11 ] |
|
Integrated in Jinshan Xiong : cd180a0ef35d87cd4e64d71db8f52d3916b7afae
|
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Jinshan Xiong : a91a2a4fdd7550f08ae3b00f58f9eeec3ac3777b
|
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Jinshan Xiong : a91a2a4fdd7550f08ae3b00f58f9eeec3ac3777b
|
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Jinshan Xiong : a91a2a4fdd7550f08ae3b00f58f9eeec3ac3777b
|
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Jinshan Xiong : a91a2a4fdd7550f08ae3b00f58f9eeec3ac3777b
|
| Comment by Build Master (Inactive) [ 01/Apr/11 ] |
|
Integrated in Jinshan Xiong : a91a2a4fdd7550f08ae3b00f58f9eeec3ac3777b
|
| Comment by Peter Jones [ 07/Apr/11 ] |
|
Update from Bull "Fix delivered, no new occurrence of the bug so far! " |
| Comment by Peter Jones [ 21/Apr/11 ] |
|
Still no reoccurences at CEA running with the patch |
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Peter Jones [ 25/Apr/11 ] |
|
Patch landed for 2.1. Please reopen if any further work needed |
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 25/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 27/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 27/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 27/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 27/Apr/11 ] |
|
Integrated in Oleg Drokin : 32b2ddf168b846ccf8c83329728905f6c5c8bbcb
|
| Comment by Build Master (Inactive) [ 28/Jun/11 ] |
|
Integrated in john :
|