Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.13.0, Lustre 2.12.3, Lustre 2.14.0, Lustre 2.12.4, Lustre 2.12.5
-
None
-
VMs + ldiskfs
-
3
-
9223372036854775807
Description
The following source code introduce an regression when the "changelog_catalog" reach the end of llog catalog:
284 static int chlg_load(void *args) 285 { ..... 293 294 crs->crs_last_catidx = -1; <----- 295 crs->crs_last_idx = 0; ..... 332 rc = llog_cat_process(NULL, llh, chlg_read_cat_process_cb, crs, 333 crs->crs_last_catidx, crs->crs_last_idx);
The -1 value or LLOG_CAT_FIRST for startcat is special value that scan the a llog catalog from index 0 to lgh_last_idx. It will not process catalog records at the end of catalog if the catalog has wrapped arround ( from llh_cat_idx to the end of catalog indexes). See llog_cat_process_or_fork for more information.
So when the message "llog_cat_process_or_fork: catlog "DFID" crosses index zero" by changelogs client, the client will "lost" all the old changelogs that was not synchronized.
The polling mode is not concerned by this issue (only the first iteration).
I reproduced this bug on my VMs. I am attaching to this ticket the changelogs files that I used for my tests.