Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-18492

MDS should delete objects aggressively as filesystem gets full

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.14.0, Lustre 2.16.1
    • 3
    • 9223372036854775807

    Description

      There are a number of tests that depend on "wait_delete_completed" in order to avoid running out of space when the filesystem is getting full. However, it would be better if the MDS(es) would more aggressively destroy OST objects to reclaim space when the OSTs are getting full.

      The MDSes could potentially slow down object creation in this case to moderate new file creation to match the rate of object destroys, so that the free space is not totally consumed when there are still objects in the queue to be destroyed.

      Attachments

        Activity

          [LU-18492] MDS should delete objects aggressively as filesystem gets full

          so, the are two major "limits" besides uncommitted MDS_UNLINK - number of RPCs with OST_DESTROY to OST in flight and number of uncommitted OST_DESTROY. probably we could improve this using some sort of batching.

          bzzz Alex Zhuravlev added a comment - so, the are two major "limits" besides uncommitted MDS_UNLINK - number of RPCs with OST_DESTROY to OST in flight and number of uncommitted OST_DESTROY. probably we could improve this using some sort of batching.

          Sure, but if we hit "ENOSPC" on the OSTs is already too late. We should have some tracking like "deleted objects consume more than 50% of remaining space" to trigger faster OST object destroys.

          adilger Andreas Dilger added a comment - Sure, but if we hit "ENOSPC" on the OSTs is already too late. We should have some tracking like "deleted objects consume more than 50% of remaining space" to trigger faster OST object destroys.

          there is already code doing something similar?:

          		/*
          		 * all precreated objects have been used and no-space
          		 * status leave us no chance to succeed very soon
          		 * but if there is destroy in progress, then we should
          		 * wait till that is done - some space might be released
          		 */
          		if (unlikely(rc == -ENOSPC)) {
          			if (atomic_read(&d->opd_sync_changes) && synced == 0) {
          				/* force local commit to release space */
          				dt_commit_async(env, d->opd_storage);
          				osp_sync_check_for_work(d);
          				synced = 1;
          			}
          
          bzzz Alex Zhuravlev added a comment - there is already code doing something similar?: /* * all precreated objects have been used and no-space * status leave us no chance to succeed very soon * but if there is destroy in progress, then we should * wait till that is done - some space might be released */ if (unlikely(rc == -ENOSPC)) { if (atomic_read(&d->opd_sync_changes) && synced == 0) { /* force local commit to release space */ dt_commit_async(env, d->opd_storage); osp_sync_check_for_work(d); synced = 1; }

          People

            wc-triage WC Triage
            adilger Andreas Dilger
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: