[LU-3936] ldlm_cancel_stale_locks()) ASSERTION( count > 0 ) failed Created: 12/Sep/13 Updated: 20/Nov/13 Resolved: 20/Nov/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.6.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Andriy Skulysh | Assignee: | Dmitry Eremin (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | patch | ||
| Severity: | 3 |
| Rank (Obsolete): | 10409 |
| Description |
|
Aug 17 18:18:49 snx11003n003 kernel: [873893.844231] LustreError: 80225:0:(ldlm_lock.c:1792:ldlm_cancel_stale_locks()) ASSERTION( count > 0 ) failed: |
| Comments |
| Comment by Andriy Skulysh [ 12/Sep/13 ] |
| Comment by Dmitry Eremin (Inactive) [ 24/Oct/13 ] |
|
I'm not sure this issue is related to 2.5 code. There is no ldlm_cancel_stale_locks() function at all. Could you specify the real version of Lustre which got this assertion please? |
| Comment by Andriy Skulysh [ 24/Oct/13 ] |
|
It was caught on Lustre 2.1, but it doesn't matter because ldlm_pool_shrink() and others are called with negative number of locks to cancel |
| Comment by Dmitry Eremin (Inactive) [ 24/Oct/13 ] |
|
Hmm. Could you provide a reproducer please? I agree the expression "1 + nr_locks * nr / total" can potentially overflow int32 but I try to understand why this cause a crash you referring to. |
| Comment by Andreas Dilger [ 25/Oct/13 ] |
|
Andriy, Secondly, there is no ldlm_cancel_stale_locks() that I can find in either master or in 2.1, nor could I find the above LASSERT(count > 0) in some other function. Could you please tell me which specific version of Lustre this is in, or is this in some patch in Gerrit that is not landed yet? I think I was incorrect in approving the original patch for this problem, because I didn't actually look closely enough at this bug when inspecting the code. I can't see how that patch actually fixes any problem. |
| Comment by Andreas Dilger [ 25/Oct/13 ] |
|
My bad. I see that there is an integer overflow if "nr" is large, so the original patch is not useless. I'm not yet sure what Dmitry's patch http://review.whamcloud.com/8075 is doing, but we shouldn't close this bug while it is still open. |
| Comment by Dmitry Eremin (Inactive) [ 20/Nov/13 ] |
|
There are no more concerns, therefore I close the ticket. |