Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
3
-
17159
Description
There is race condition in LFSCK when inject failure stub for test. For example:
764 if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_DELAY2) && 765 cfs_fail_val > 0) { 766 struct l_wait_info lwi; 767 768 lwi = LWI_TIMEOUT(cfs_time_seconds(cfs_fail_val), 769 NULL, NULL); 770 l_wait_event(thread->t_ctl_waitq, 771 !thread_is_running(thread), 772 &lwi); 773 774 if (unlikely(!thread_is_running(thread))) { 775 CDEBUG(D_LFSCK, "%s: scan dir exit for engine " 776 "stop, parent "DFID", cookie "LPX64"\n", 777 lfsck_lfsck2name(lfsck), 778 PFID(lfsck_dto2fid(dir)), 779 lfsck->li_cookie_dir); 780 RETURN(0); 781 } 782 }
The "cfs_fail_val" may be changed by others after the check at line 765 but before using it at line 768. Then the LFSCK engine will fall into "wait" until someone run "lfsck_stop".