[LU-6146] LFSCK fall into wait for ever because of race condition when check/set cfs_fail_val Created: 21/Jan/15 Updated: 25/Jan/15 Resolved: 25/Jan/15 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.7.0 |
| Fix Version/s: | Lustre 2.7.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | nasf (Inactive) | Assignee: | nasf (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | HB | ||
| Severity: | 3 |
| Rank (Obsolete): | 17159 |
| Description |
|
There is race condition in LFSCK when inject failure stub for test. For example: 764 if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_DELAY2) && 765 cfs_fail_val > 0) { 766 struct l_wait_info lwi; 767 768 lwi = LWI_TIMEOUT(cfs_time_seconds(cfs_fail_val), 769 NULL, NULL); 770 l_wait_event(thread->t_ctl_waitq, 771 !thread_is_running(thread), 772 &lwi); 773 774 if (unlikely(!thread_is_running(thread))) { 775 CDEBUG(D_LFSCK, "%s: scan dir exit for engine " 776 "stop, parent "DFID", cookie "LPX64"\n", 777 lfsck_lfsck2name(lfsck), 778 PFID(lfsck_dto2fid(dir)), 779 lfsck->li_cookie_dir); 780 RETURN(0); 781 } 782 } The "cfs_fail_val" may be changed by others after the check at line 765 but before using it at line 768. Then the LFSCK engine will fall into "wait" until someone run "lfsck_stop". |
| Comments |
| Comment by Gerrit Updater [ 21/Jan/15 ] |
|
Fan Yong (fan.yong@intel.com) uploaded a new patch: http://review.whamcloud.com/13481 |
| Comment by nasf (Inactive) [ 21/Jan/15 ] |
|
This issue may cause many sanity-lfsck test failures, so we have to resolve it before Lustre-2.7 released. |
| Comment by Andreas Dilger [ 21/Jan/15 ] |
|
I don't see any tests in Maloo that have been marked with this bug. I do see a late number of test failures due to |
| Comment by nasf (Inactive) [ 23/Jan/15 ] |
|
Recently, there are many failure instances for sanity-lfsck test_4, part of them are because of |
| Comment by Gerrit Updater [ 25/Jan/15 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/13481/ |
| Comment by Peter Jones [ 25/Jan/15 ] |
|
Landed for 2.7 |