[LU-11827] Race between llog_cat_declare_add_rec and llog_cat_current_log Created: 24/Dec/18 Updated: 01/Apr/19 Resolved: 27/Feb/19 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.13.0, Lustre 2.12.1 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Vladimir Saveliev | Assignee: | Lai Siyao |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
llog_cat_declare_add_rec() operates on &cathandle->u.chd.chd_next_log without having it protected: int llog_cat_declare_add_rec(const struct lu_env *env, ... rc = llog_cat_prep_log(env, cathandle, &cathandle->u.chd.chd_current_log, th); ... rc = llog_cat_prep_log(env, cathandle, &cathandle->u.chd.chd_next_log, th); That races with llog_cat_current_log() when it switches to next log and updates cathandle->u.chd.chd_next_log: static struct llog_handle *llog_cat_current_log(struct llog_handle *cathandle, ... down_write_nested(&cathandle->lgh_lock, LLOGH_CAT); ... CDEBUG(D_INODE, "use next log\n"); loghandle = cathandle->u.chd.chd_next_log; cathandle->u.chd.chd_current_log = loghandle; cathandle->u.chd.chd_next_log = NULL; down_write_nested(&loghandle->lgh_lock, LLOGH_LOG); ... The following trace has been observed: 00000040:00000001:19.0:1545138333.143874:0:177713:0:(llog_cat.c:605:llog_cat_declare_add_rec()) Process entered 00000040:00000001:19.0:1545138333.143875:0:177713:0:(llog.c:940:llog_exist()) Process leaving (rc=1 : 1 : 1) 00000040:00000001:19.0:1545138333.143876:0:177713:0:(llog.c:940:llog_exist()) Process leaving (rc=0 : 0 : 0) Process 99986 jumps in and switches pointer to next log in cathalog handle to NULL: 00000040:00000002:21.0:1545138333.143876:0:99986:0:(llog_cat.c:521:llog_cat_current_log()) use next log Process 177713 continues: llog_cat_prep_log->llog_declare_create->llog_handle2ops, find NULL in and fails in llog_handle2ops() with -22 as long as *ploghandle is NULL: 00000040:00000001:19.0:1545138333.143877:0:177713:0:(llog.c:954:llog_declare_create()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea) |
| Comments |
| Comment by Gerrit Updater [ 24/Dec/18 ] |
|
Vladimir Saveliev (c17830@cray.com) uploaded a new patch: https://review.whamcloud.com/33914 |
| Comment by Oleg Drokin [ 25/Feb/19 ] |
|
Please see a somewhat related failure scenario in LU-12008, do you think this will help there too? |
| Comment by Gerrit Updater [ 27/Feb/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33914/ |
| Comment by Peter Jones [ 27/Feb/19 ] |
|
Landed for 2.13 |
| Comment by Patrick Farrell (Inactive) [ 28/Feb/19 ] |
|
Looks like this was hit on 2.12.0 by a customer: |
| Comment by Gerrit Updater [ 19/Mar/19 ] |
|
Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34455 |
| Comment by Gerrit Updater [ 01/Apr/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34455/ |