[LU-7340] ChangeLogs catalog full condition should be handled more gracefully - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Fixed
Priority: Critical
Fix Version/s: Lustre 2.11.0, Lustre 2.10.3
Affects Version/s: Lustre 2.8.0, Lustre 2.5.4
Labels:
None

Rank (Obsolete):
9223372036854775807

Description

Presently when a LLOG Catalog wraps and its latest assigned index collides with its oldest and still in use index, ENOSPC is returned and the caller just ignore the fact that LLOG record could not be written.

For ChangeLogs specific usage, some actions could be attempted to recover space/records, some ideas have already been detailed in ~~LU-6556~~, but it seems better to address them in this separate ticket.

Input from Andreas :
I think the other thing that is needed here is to automatically unregister ChangeLog watcher(s) if the changelog is full or the MDS runs out of space (by default), or block all MDS operations until the ChangeLog can be written (if /proc tunable is set to make ChangeLog updates mandatory). It should unregister starting with the oldest watcher on the assumption that the older watcher was forgotten and newer ones are still running, and that this will release the most space. The unregistration should cancel records up to the next watcher, or all remaining records if no other watchers are left.

Input from Robert :
I suggest going a step further and proactively remove stale watchers after a configurable period or when hitting a max watermark to try o avoid running out of space. Also, being unregistered is a reasonable notification to the application that they've lost their changelog feed and need to resync.

Attachments

Issue Links

is duplicated by

LU-1586 no free catalog slots for log

Resolved

is related to

LU-8856 ZFS-MDT 100% full. Cannot delete files.

Resolved

LU-10527 LustreError: 7830:0:(llog_cat.c:313:llog_cat_current_log()) ASSERTION( llh )

Resolved

LU-10680 MDT becoming unresponsive in 2.10.3

Resolved

LU-12871 enable changelog garbage collection by default

Resolved

is related to

LU-9055 MDS crash due to changelog being full

Open

(1 is related to )

Activity

People

Assignee:: Bruno Faccini (Inactive)

Reporter:: Bruno Faccini (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 26/Oct/15 5:52 PM

Updated:: 14/May/21 5:52 PM

Resolved:: 17/Dec/17 3:54 PM