Details
-
Bug
-
Resolution: Fixed
-
Critical
-
None
-
None
-
3
-
9223372036854775807
Description
[admin@snx11253n002 ~]$ lctl get_param mdd.snx11253-MDT0000.changelog_users;
mdd.snx11253-MDT0000.changelog_users=
current index: 8936329
ID index
cl2 7049068
cl31347 3348651
cl31349 3348651
cl31351 3348651
cl33628 6330946
cl33632 6335474
cl2 7049068
cl33962 7379382
cl33963 7379382
cl33964 7379382
TEST PROCEDURE:
For each rank:
(I) CONSUME_LOGS (STARTREC=<arg> or 1)
[1] Set endrec = startrec + params.chunksize
[2] Read CL in parallel with N threads with range (startrec, endrec), and extract
actual last record read (lastrec) .
a] If no new files are found ...
i] Retry until allowed (by refresh timeout, refresh retries).
Otherwise quit.
b] If new files are found ...
i] If clearing is enabled, clear logs up to lastrec.
ii] Increment record ranges
(startrec = lastrec, endrec = startrec + chunksize)
d] repeat step [2]
TEST EXECUTION:
The test was executed with the follow arguments:
$ changelog-consumer.py -v --read-clear --clusers cl2 cl2 cl2 cl2 --mdt snx11253-MDT0000 --chunksize 9999 --read-threads 8 --update-retries 6 --update-after 30
Basically this is reading 9999 Changelogs in 8 threads, then clearing 9999 Changelogs in 1 thread (implied, not show). These actions are being duplicated across 4 separate nodes, and against the "cl2" user.
The root cause is parallel processing llog with modification.
The llog has llog_handle which include
lgh_cur_idx; /* used during llog_process */
lgh_cur_offset; /* used during llog_process */
They both are modified by llog_process_thread every time we process record from llog.
Those fields are used at llog_osd_write_rec for modification/rewrite llog record. The race exist when two or more threads are processing the same llog and at least one of them do modification.
1) llog_process_thread lgh_cur_idx=10 lgh_cur_offset=2000,
2) llog_process_thread lgh_cur_idx=5 lgh_cur_offset=1000,
1) log_process_thread >lpi_cb>mdd_changelog_user_purge_cb->llog_write... llog_osd_write_rec and then
if (idx != loghandle->lgh_cur_idx) { CERROR("%s: modify index mismatch %d %d\n", o->do_lu.lo_dev->ld_obd->obd_name, idx, loghandle->lgh_cur_idx); RETURN(-EFAULT); }
From logs
May 4 19:55:29 snx11253n002 kernel: LustreError: 29737:0:(llog_osd.c:441:llog_osd_write_rec()) snx11253-MDT0000-osd: modify index mismatch 2 34132
This was the first case.
(2) Lets imagine that second thread modify lgh_cur_idx/lgh_cur_offset right after this check
if (idx != loghandle->lgh_cur_idx)
then
lgi->lgi_off = loghandle->lgh_cur_offset;
lgi->lgi_off became 1000 instead of 2000. And the wrong modification will happen.