[LU-18492] MDS should delete objects aggressively as filesystem gets full - Whamcloud Community JIRA

Details

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: Lustre 2.14.0, Lustre 2.16.1
Labels:
- medium

Severity:
3
Rank (Obsolete):
9223372036854775807

Description

There are a number of tests that depend on "wait_delete_completed" in order to avoid running out of space when the filesystem is getting full. However, it would be better if the MDS(es) would more aggressively destroy OST objects to reclaim space when the OSTs are getting full.

The MDSes could potentially slow down object creation in this case to moderate new file creation to match the rate of object destroys, so that the free space is not totally consumed when there are still objects in the queue to be destroyed.

Attachments

Activity

[LU-18492] MDS should delete objects aggressively as filesystem gets full

Alex Zhuravlev added a comment - 02/Apr/25 3:42 AM

so, the are two major "limits" besides uncommitted MDS_UNLINK - number of RPCs with OST_DESTROY to OST in flight and number of uncommitted OST_DESTROY. probably we could improve this using some sort of batching.

Alex Zhuravlev added a comment - 02/Apr/25 3:42 AM so, the are two major "limits" besides uncommitted MDS_UNLINK - number of RPCs with OST_DESTROY to OST in flight and number of uncommitted OST_DESTROY. probably we could improve this using some sort of batching.

Andreas Dilger added a comment - 01/Apr/25 9:19 PM

Sure, but if we hit "ENOSPC" on the OSTs is already too late. We should have some tracking like "deleted objects consume more than 50% of remaining space" to trigger faster OST object destroys.

Andreas Dilger added a comment - 01/Apr/25 9:19 PM Sure, but if we hit "ENOSPC" on the OSTs is already too late. We should have some tracking like "deleted objects consume more than 50% of remaining space" to trigger faster OST object destroys.

Alex Zhuravlev added a comment - 01/Apr/25 4:38 PM

there is already code doing something similar?:

		/*
		 * all precreated objects have been used and no-space
		 * status leave us no chance to succeed very soon
		 * but if there is destroy in progress, then we should
		 * wait till that is done - some space might be released
		 */
		if (unlikely(rc == -ENOSPC)) {
			if (atomic_read(&d->opd_sync_changes) && synced == 0) {
				/* force local commit to release space */
				dt_commit_async(env, d->opd_storage);
				osp_sync_check_for_work(d);
				synced = 1;
			}

Alex Zhuravlev added a comment - 01/Apr/25 4:38 PM there is already code doing something similar?: /* * all precreated objects have been used and no-space * status leave us no chance to succeed very soon * but if there is destroy in progress, then we should * wait till that is done - some space might be released */ if (unlikely(rc == -ENOSPC)) { if (atomic_read(&d->opd_sync_changes) && synced == 0) { /* force local commit to release space */ dt_commit_async(env, d->opd_storage); osp_sync_check_for_work(d); synced = 1; }

People

Assignee:: WC Triage

Reporter:: Andreas Dilger

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 27/Nov/24 1:48 AM

Updated:: 02/Apr/25 3:42 AM