Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
3
-
9223372036854775807
Description
[admin@snx11253n002 ~]$ lctl get_param mdd.snx11253-MDT0000.changelog_users;
mdd.snx11253-MDT0000.changelog_users=
current index: 8936329
ID index
cl2 7049068
cl31347 3348651
cl31349 3348651
cl31351 3348651
cl33628 6330946
cl33632 6335474
cl2 7049068
cl33962 7379382
cl33963 7379382
cl33964 7379382
TEST PROCEDURE:
For each rank:
(I) CONSUME_LOGS (STARTREC=<arg> or 1)
[1] Set endrec = startrec + params.chunksize
[2] Read CL in parallel with N threads with range (startrec, endrec), and extract
actual last record read (lastrec) .
a] If no new files are found ...
i] Retry until allowed (by refresh timeout, refresh retries).
Otherwise quit.
b] If new files are found ...
i] If clearing is enabled, clear logs up to lastrec.
ii] Increment record ranges
(startrec = lastrec, endrec = startrec + chunksize)
d] repeat step [2]
TEST EXECUTION:
The test was executed with the follow arguments:
$ changelog-consumer.py -v --read-clear --clusers cl2 cl2 cl2 cl2 --mdt snx11253-MDT0000 --chunksize 9999 --read-threads 8 --update-retries 6 --update-after 30
Basically this is reading 9999 Changelogs in 8 threads, then clearing 9999 Changelogs in 1 thread (implied, not show). These actions are being duplicated across 4 separate nodes, and against the "cl2" user.
The root cause is parallel processing llog with modification.
The llog has llog_handle which include
lgh_cur_idx; /* used during llog_process */
lgh_cur_offset; /* used during llog_process */
They both are modified by llog_process_thread every time we process record from llog.
Those fields are used at llog_osd_write_rec for modification/rewrite llog record. The race exist when two or more threads are processing the same llog and at least one of them do modification.
1) llog_process_thread lgh_cur_idx=10 lgh_cur_offset=2000,
2) llog_process_thread lgh_cur_idx=5 lgh_cur_offset=1000,
1) log_process_thread >lpi_cb>mdd_changelog_user_purge_cb->llog_write... llog_osd_write_rec and then
if (idx != loghandle->lgh_cur_idx) { CERROR("%s: modify index mismatch %d %d\n", o->do_lu.lo_dev->ld_obd->obd_name, idx, loghandle->lgh_cur_idx); RETURN(-EFAULT); }
From logs
May 4 19:55:29 snx11253n002 kernel: LustreError: 29737:0:(llog_osd.c:441:llog_osd_write_rec()) snx11253-MDT0000-osd: modify index mismatch 2 34132
This was the first case.
(2) Lets imagine that second thread modify lgh_cur_idx/lgh_cur_offset right after this check
if (idx != loghandle->lgh_cur_idx)
then
lgi->lgi_off = loghandle->lgh_cur_offset;
lgi->lgi_off became 1000 instead of 2000. And the wrong modification will happen.
Alex
The functional change tracked here has landed to master. Do you still intend to land the accompanying test? If so, please could you rebase it so that it can complete testing and reviews?
Thanks
Peter