Details
-
Bug
-
Resolution: Fixed
-
Minor
-
Lustre 2.5.0, Lustre 2.7.0, Lustre 2.8.0
-
3
-
9223372036854775807
Description
Note that this is surely responsible for a number of mysterious sanity-hsm test failures.
Due to a bug in (or misuse of) llog_cat_process() the HSM actions proc file will skip some records when read.
~# # mount and setup HSM ~# killall lhsmtool_posix ~# wc -l /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions 0 /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions ~# cd /mnt/lustre lustre# for ((i = 0; i < 20; i++)); do > touch f$i > lfs hsm_archive f$i > done
Now there should be 20 records in the actions file but there are only 19:
lustre# wc -l /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions 19 /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions
The missing record corresponds to f17:
lustre# lfs path2fid f17 [0x200000401:0x12:0x0] lustre# grep '0x200000401:0x12:0x0' /proc/fs/lustre/mdt/lustre-MDT0000/hsm/actions lustre# lfs hsm_action f17 f17: ARCHIVE waiting (from 0 to EOF)
The issue is with how the startidx parameter to llog_cat_process() is handled (see mdt_hsm_actions_proc_show() and hsm_actions_show_cb()). startidx becomes lpd_startidx then lpcd_first_idx which is actually skipped in llog_process_thread().