[LU-7799] mdt.*.hsm.actions skips some records Created: 19/Feb/16 Updated: 26/Sep/16 Resolved: 14/Mar/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.5.0, Lustre 2.7.0, Lustre 2.8.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | John Hammond | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | hsm, llog | ||
| Issue Links: |
|
||||
| Severity: | 3 | ||||
| Rank (Obsolete): | 9223372036854775807 | ||||
| Description |
|
Note that this is surely responsible for a number of mysterious sanity-hsm test failures. Due to a bug in (or misuse of) llog_cat_process() the HSM actions proc file will skip some records when read. ~# # mount and setup HSM ~# killall lhsmtool_posix ~# wc -l /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions 0 /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions ~# cd /mnt/lustre lustre# for ((i = 0; i < 20; i++)); do > touch f$i > lfs hsm_archive f$i > done Now there should be 20 records in the actions file but there are only 19: lustre# wc -l /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions 19 /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions The missing record corresponds to f17: lustre# lfs path2fid f17 [0x200000401:0x12:0x0] lustre# grep '0x200000401:0x12:0x0' /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions lustre# lfs hsm_action f17 f17: ARCHIVE waiting (from 0 to EOF) The issue is with how the startidx parameter to llog_cat_process() is handled (see mdt_hsm_actions_proc_show() and hsm_actions_show_cb()). startidx becomes lpd_startidx then lpcd_first_idx which is actually skipped in llog_process_thread(). |
| Comments |
| Comment by Gerrit Updater [ 19/Feb/16 ] |
|
John L. Hammond (john.hammond@intel.com) uploaded a new patch: http://review.whamcloud.com/18525 |
| Comment by Gerrit Updater [ 14/Mar/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/18525/ |