Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Lustre 2.7.0
-
3
-
17159
Description
There is race condition in LFSCK when inject failure stub for test. For example:
764 if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_DELAY2) && 765 cfs_fail_val > 0) { 766 struct l_wait_info lwi; 767 768 lwi = LWI_TIMEOUT(cfs_time_seconds(cfs_fail_val), 769 NULL, NULL); 770 l_wait_event(thread->t_ctl_waitq, 771 !thread_is_running(thread), 772 &lwi); 773 774 if (unlikely(!thread_is_running(thread))) { 775 CDEBUG(D_LFSCK, "%s: scan dir exit for engine " 776 "stop, parent "DFID", cookie "LPX64"\n", 777 lfsck_lfsck2name(lfsck), 778 PFID(lfsck_dto2fid(dir)), 779 lfsck->li_cookie_dir); 780 RETURN(0); 781 } 782 }
The "cfs_fail_val" may be changed by others after the check at line 765 but before using it at line 768. Then the LFSCK engine will fall into "wait" until someone run "lfsck_stop".
Attachments
Activity
Resolution | New: Fixed [ 1 ] | |
Status | Original: In Progress [ 3 ] | New: Resolved [ 5 ] |
Labels | New: HB |
Affects Version/s | New: Lustre 2.7.0 [ 10631 ] |
Status | Original: Open [ 1 ] | New: In Progress [ 3 ] |
Landed for 2.7