[LU-11827] Race between llog_cat_declare_add_rec and llog_cat_current_log Created: 24/Dec/18  Updated: 01/Apr/19  Resolved: 27/Feb/19

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.13.0, Lustre 2.12.1

Type: Bug Priority: Minor
Reporter: Vladimir Saveliev Assignee: Lai Siyao
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by LU-11984 Intermittent file create or rm fail w... Resolved
Related
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

llog_cat_declare_add_rec() operates on &cathandle->u.chd.chd_next_log without having it protected:

int llog_cat_declare_add_rec(const struct lu_env *env,
...
        rc = llog_cat_prep_log(env, cathandle,
                               &cathandle->u.chd.chd_current_log, th);
...
	rc = llog_cat_prep_log(env, cathandle, &cathandle->u.chd.chd_next_log,
                               th);

That races with llog_cat_current_log() when it switches to next log and updates cathandle->u.chd.chd_next_log:

static struct llog_handle *llog_cat_current_log(struct llog_handle *cathandle,
...
        down_write_nested(&cathandle->lgh_lock, LLOGH_CAT);
...
        CDEBUG(D_INODE, "use next log\n");
 
        loghandle = cathandle->u.chd.chd_next_log;
        cathandle->u.chd.chd_current_log = loghandle;
        cathandle->u.chd.chd_next_log = NULL;
        down_write_nested(&loghandle->lgh_lock, LLOGH_LOG);
...

The following trace has been observed:
Process 177713 enters llog_cat_declare_add_rec():

00000040:00000001:19.0:1545138333.143874:0:177713:0:(llog_cat.c:605:llog_cat_declare_add_rec()) Process entered
00000040:00000001:19.0:1545138333.143875:0:177713:0:(llog.c:940:llog_exist()) Process leaving (rc=1 : 1 : 1)
00000040:00000001:19.0:1545138333.143876:0:177713:0:(llog.c:940:llog_exist()) Process leaving (rc=0 : 0 : 0)

Process 99986 jumps in and switches pointer to next log in cathalog handle to NULL:

00000040:00000002:21.0:1545138333.143876:0:99986:0:(llog_cat.c:521:llog_cat_current_log()) use next log

Process 177713 continues: llog_cat_prep_log->llog_declare_create->llog_handle2ops, find NULL in and fails in llog_handle2ops() with -22 as long as *ploghandle is NULL:

00000040:00000001:19.0:1545138333.143877:0:177713:0:(llog.c:954:llog_declare_create()) Process leaving (rc=18446744073709551594 : -22 : ffffffffffffffea)


 Comments   
Comment by Gerrit Updater [ 24/Dec/18 ]

Vladimir Saveliev (c17830@cray.com) uploaded a new patch: https://review.whamcloud.com/33914
Subject: LU-11827 llog: protect cathandle in llog_cat_declare_add_rec
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 052bfbd11bad6a04821a99ba60094b55cab38ab6

Comment by Oleg Drokin [ 25/Feb/19 ]

Please see a somewhat related failure scenario in LU-12008, do you think this will help there too?

Comment by Gerrit Updater [ 27/Feb/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33914/
Subject: LU-11827 llog: protect cathandle in llog_cat_declare_add_rec
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 59a62ada2e18174e5611730e8bcf5ba3165ca2b9

Comment by Peter Jones [ 27/Feb/19 ]

Landed for 2.13

Comment by Patrick Farrell (Inactive) [ 28/Feb/19 ]

Looks like this was hit on 2.12.0 by a customer:
https://jira.whamcloud.com/browse/LU-11984

Comment by Gerrit Updater [ 19/Mar/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34455
Subject: LU-11827 llog: protect cathandle in llog_cat_declare_add_rec
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: 8fe07060d1dce0813ca92178d2e2541e43e99f20

Comment by Gerrit Updater [ 01/Apr/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34455/
Subject: LU-11827 llog: protect cathandle in llog_cat_declare_add_rec
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 834d54138b250ddedc06f478e9cdfdbf0c352bda

Generated at Sat Feb 10 02:47:16 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.