[LU-5266] LBUG on Failover -ldlm_process_extent_lock()) ASSERTION( lock->l_granted_mode != lock->l_req_mode ) - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: Lustre 2.6.0, Lustre 2.5.4
Affects Version/s: Lustre 2.6.0
Labels:
- HB
Environment:
Hyperion - 2.5.60 build 2538

Severity:
3
Rank (Obsolete):
14694

Description

After hard failover of devices to server iws19, server wedged, then hit LBUG.
Services never complete recovery, and appear to either restart the timer or something:

Jun 27 10:08:53 iws19 kernel: Lustre: lustre-OST000c: Will be in recovery for at least 2:30, or until 316 clients reconnect
Jun 27 10:11:53 iws19 kernel: Lustre: lustre-OST000c: recovery is timed out, evict stale exports
Jun 27 10:18:30 iws19 kernel: Lustre: lustre-OST000c: Client c52d4856-d1df-b87b-911c-f1bfbc23a24d (at 192.168.124.182@o2ib) reconnecting, waiting for 316 clients in recovery for 2:27

The server reports being cpu-bound prior to the failure
Console log attached - unfortunately the dump after the LBUG failed.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

iws19.crash.txt
24 kB
27/Jun/14 5:34 PM
iws23.dmesg
29 kB
09/Jul/14 5:50 PM
iws29.lustre-log.1405540543.8118.txt
0.3 kB
16/Jul/14 8:50 PM
iws29.lustre-log.1405540562.8181.txt
316 kB
16/Jul/14 8:50 PM
iws29.messages.txt
37 kB
16/Jul/14 8:50 PM
lustre-log.1404916388.10827.txt
0.2 kB
09/Jul/14 5:50 PM
lustre-log.1404916402.10701.txt
145 kB
09/Jul/14 5:50 PM
lustre-log.1404916421.10764.txt
210 kB
09/Jul/14 5:50 PM

Issue Links

is related to

LU-5496 fix for LU-5266

Resolved

is related to

LU-2827 mdt_intent_fixup_resent() cannot find the proper lock in hash

Resolved

Activity

People

Assignee:: Hongchao Zhang

Reporter:: Cliff White (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 27/Jun/14 5:34 PM

Updated:: 22/Oct/14 4:50 AM

Resolved:: 11/Jul/14 11:18 AM