[LU-15645] gap in recovery llog should not be a fatal error - Whamcloud Community JIRA

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: Lustre 2.15.0, Lustre 2.12.10
Affects Version/s: Lustre 2.14.0
Labels:
None

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

A gap in the MDT recovery llog (of unknown origin) was hit during recovery.

log_process_thread()) lfs02-MDT001e-osp-MDT0000: [0x3:0x1b70:0x4] Invalid record: index 16123 but expected 16122

and this was later confirmed with llog_reader:

rec #15221 type=106a0000 len=1160 offset 17231040
rec #16097 type=106a0000 len=1160 offset 18220168
rec #16098 type=106a0000 len=1160 offset 18221328
rec #16099 type=106a0000 len=1160 offset 18222488
rec #16100 type=106a0000 len=1160 offset 18223648
Previous index is 16121, current 16123, offset 18249168
rec #18718 type=106a0000 len=1160 offset 21180888
rec #20278 type=106a0000 len=1160 offset 22943400

This caused the MDT recovery to fail and all of the clients were evicted from that MDT. It isn't clear whether the global eviction is necessary, or if this should be handled more gracefully? Other MDTs likely have a copy of that operation for replay, and if not then it would be lost.

What is more problematic is that this recovery llog error is persistent, and the same problem happens on every recovery for that MDT. If the clients (and MDTs?) are evicted from recovery, the llog records should at a minimum be cancelled, or the llog file should be cleared. Better yet would be to not treat this gap as a fatal error, since I don't think there is anything that can be done about it at this point.

Attachments

Issue Links

is related to

LU-15646 fix DOSTID printing of llog_id FIDs

Resolved

LU-15938 MDT recovery did not finish due to corrupt llog record

Resolved

is related to

LU-15761 cannot finish MDS recovery

Resolved

LU-15644 failed llog cancel should not generate an error

Resolved

Activity

People

Assignee:: Alex Zhuravlev

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 13/Mar/22 12:04 AM

Updated:: 08/Dec/22 12:02 AM

Resolved:: 05/May/22 7:04 PM