[LU-9714] Changelog consumer test reports 'Local llog found corrupted' Created: 27/Jun/17 Updated: 29/Jan/22 Resolved: 29/Jan/22 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Alexander Boyko | Assignee: | Alexander Boyko |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||
| Epic/Theme: | patch | ||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
[admin@snx11253n002 ~]$ lctl get_param mdd.snx11253-MDT0000.changelog_users; TEST PROCEDURE: The root cause is parallel processing llog with modification. lgh_cur_idx; /* used during llog_process */ They both are modified by llog_process_thread every time we process record from llog. Those fields are used at llog_osd_write_rec for modification/rewrite llog record. The race exist when two or more threads are processing the same llog and at least one of them do modification. 1) log_process_thread >lpi_cb>mdd_changelog_user_purge_cb->llog_write... llog_osd_write_rec and then if (idx != loghandle->lgh_cur_idx) {
CERROR("%s: modify index mismatch %d %d\n",
o->do_lu.lo_dev->ld_obd->obd_name, idx,
loghandle->lgh_cur_idx);
RETURN(-EFAULT);
}
From logs
This was the first case. (2) Lets imagine that second thread modify lgh_cur_idx/lgh_cur_offset right after this check
then
lgi->lgi_off became 1000 instead of 2000. And the wrong modification will happen. |
| Comments |
| Comment by Gerrit Updater [ 27/Jun/17 ] |
|
Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: https://review.whamcloud.com/27838 |
| Comment by Mikhail Pershin [ 12/Jul/17 ] |
|
what kind of modification is used in that test? You've mentioned only 'clearing' of changelog, is it done with llog_cancel or you are using llog_write() to wipe these records entirely? |
| Comment by Alexander Boyko [ 12/Jul/17 ] |
|
user mode test, so it was lfs changelog_clear |
| Comment by Gerrit Updater [ 18/Sep/17 ] |
|
Alexander Boyko (alexander.boyko@seagate.com) uploaded a new patch: https://review.whamcloud.com/29035 |
| Comment by Gerrit Updater [ 22/Dec/17 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/27838/ |
| Comment by Peter Jones [ 22/Dec/17 ] |
|
Alex The functional change tracked here has landed to master. Do you still intend to land the accompanying test? If so, please could you rebase it so that it can complete testing and reviews? Thanks Peter |