[LU-5266] LBUG on Failover -ldlm_process_extent_lock()) ASSERTION( lock->l_granted_mode != lock->l_req_mode ) Created: 27/Jun/14 Updated: 22/Oct/14 Resolved: 11/Jul/14 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.6.0 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.5.4 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Cliff White (Inactive) | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB | ||
| Environment: |
Hyperion - 2.5.60 build 2538 |
||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 14694 | ||||||||||||
| Description |
|
After hard failover of devices to server iws19, server wedged, then hit LBUG. Jun 27 10:08:53 iws19 kernel: Lustre: lustre-OST000c: Will be in recovery for at least 2:30, or until 316 clients reconnect Jun 27 10:11:53 iws19 kernel: Lustre: lustre-OST000c: recovery is timed out, evict stale exports Jun 27 10:18:30 iws19 kernel: Lustre: lustre-OST000c: Client c52d4856-d1df-b87b-911c-f1bfbc23a24d (at 192.168.124.182@o2ib) reconnecting, waiting for 316 clients in recovery for 2:27 The server reports being cpu-bound prior to the failure |
| Comments |
| Comment by Oleg Drokin [ 30/Jun/14 ] |
|
We need the backtrace for the crash please. |
| Comment by Andreas Dilger [ 30/Jun/14 ] |
|
Cliff, |
| Comment by Cliff White (Inactive) [ 30/Jun/14 ] |
|
Yes, we had multiple failures |
| Comment by Jodi Levi (Inactive) [ 30/Jun/14 ] |
|
HongChao, |
| Comment by Vitaly Fertman [ 30/Jun/14 ] |
|
not sure if this failure is the same as the fixed one, but as caught by the same assertion, to not create another ticket, I put it here: http://review.whamcloud.com/10903 |
| Comment by Hongchao Zhang [ 02/Jul/14 ] |
|
Yes, the issue could be triggered by the resent lock request. |
| Comment by Andreas Dilger [ 04/Jul/14 ] |
|
Was this introduced by http://review.whamcloud.com/5978 ? |
| Comment by Cliff White (Inactive) [ 09/Jul/14 ] |
|
I tested the patch on Hyperion - no more LBUG, but did have a few evictions. lustre-logs and console logs from one run attached. |
| Comment by Peter Jones [ 11/Jul/14 ] |
|
Landed for 2.6 |
| Comment by Cliff White (Inactive) [ 16/Jul/14 ] |
|
While testing this patch, still have some client evictions and log dumps. Message log and lustre log from latest attached. |
| Comment by Andreas Dilger [ 18/Jul/14 ] |
|
This should probably go into a new bug, since the LASSERT is fixed. |
| Comment by Vitaly Fertman [ 15/Aug/14 ] |
|
the previous fix was not exactly correct: http://review.whamcloud.com/11469 |
| Comment by Peter Jones [ 15/Aug/14 ] |
|
Vitaly |