[LU-5716] Improve error handling on llog process Created: 08/Oct/14  Updated: 30/Jan/22

Status: Open
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.15.0
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Niu Yawei (Inactive) Assignee: WC Triage
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Related
is related to LU-7010 "Local llog found corrupted" during D... Resolved
Severity: 3
Rank (Obsolete): 16032

 Description   

Current error handling in llog_process_thread() is dangerous:

if (unlikely(rc == -EIO && loghandle->lgh_obj != NULL)) {
		/* something bad happened to the processing of a local
		 * llog file, probably I/O error or the log got corrupted..
		 * to be able to finally release the log we discard any
		 * remaining bits in the header */
		CERROR("Local llog found corrupted\n");
		while (index <= last_index) {
			if (ext2_test_bit(index, llh->llh_bitmap) != 0)
				llog_cancel_rec(lpi->lpi_env, loghandle, index);
			index++;
		}
		rc = 0;
	}

It'll remove all records even if a process callback returns -EIO, that should be improved somehow, or at least we'd make sure no callbacks can return -EIO.


Generated at Sat Feb 10 01:53:53 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.