[LU-2127] Confusing messages from osc_announce_cached() caused by unchecked signed/unsigned comparison Created: 09/Oct/12 Updated: 25/Feb/13 Resolved: 25/Feb/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Alexandre Louvet | Assignee: | Keith Mannthey (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | mq213 | ||
| Severity: | 3 |
| Rank (Obsolete): | 5129 |
| Description |
|
Some examples, taken for convenience from Feb 28 09:37:28 compute-01-32 kernel: LustreError: 2567:0:(osc_request.c:797:osc_announce_cached()) dirty 2028 - 2028 > system dirty_max 647168 Feb 28 09:37:40 compute-01-32 kernel: LustreError: 2568:0:(osc_request.c:797:osc_announce_cached()) dirty 2040 - 2040 > system dirty_max 647168 Feb 28 09:37:46 compute-01-32 kernel: LustreError: 2565:0:(osc_request.c:797:osc_announce_cached()) dirty 1954 - 1954 > system dirty_max 647168 Feb 28 09:37:47 compute-01-32 kernel: LustreError: 2561:0:(osc_request.c:797:osc_announce_cached()) dirty 1975 - 1976 > system dirty_max 647168 Feb 28 09:37:50 compute-01-32 kernel: LustreError: 2566:0:(osc_request.c:797:osc_announce_cached()) dirty 2001 - 2002 > system dirty_max 647168 Feb 28 09:38:09 compute-01-32 kernel: LustreError: 2567:0:(osc_request.c:797:osc_announce_cached()) dirty 1946 - 1946 > system dirty_max 647168 Feb 28 09:39:20 compute-01-32 kernel: LustreError: 2615:0:(osc_request.c:797:osc_announce_cached()) dirty 2017 - 2017 > system dirty_max 647168 Feb 28 09:39:47 compute-01-32 kernel: LustreError: 2615:0:(osc_request.c:797:osc_announce_cached()) dirty 1847 - 1848 > system dirty_max 647168 ================================================================================================= this comes from the following piece of code : . } else if (cfs_atomic_read(&obd_dirty_pages) - cfs_atomic_read(&obd_dirty_transit_pages) > obd_max_dirty_pages + 1){ /* The cfs_atomic_read() allowing the cfs_atomic_inc() are * not covered by a lock thus they may safely race and trip * this CERROR() unless we add in a small fudge factor (+1). */ CERROR("dirty %d - %d > system dirty_max %d\n", cfs_atomic_read(&obd_dirty_pages), cfs_atomic_read(&obd_dirty_transit_pages), obd_max_dirty_pages); ============================================= where both obd_dirty_pages/obd_dirty_transit_pages are "likely to change" signed ints and obd_max_dirty_pages unsigned which an easily explain the wrong error msgs ... This is still true in current master and, if I am right and there is no tricky stuff I may have missed there ..., the following patch should fix : diff --git a/lustre/osc/osc_request.c b/lustre/osc/osc_request.c index af60e4b..5f75329 100644 --- a/lustre/osc/osc_request.c +++ b/lustre/osc/osc_request.c @@ -788,6 +788,7 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa, long writing_bytes) { obd_flag bits = OBD_MD_FLBLOCKS|OBD_MD_FLGRANT; + int diff; LASSERT(!(oa->o_valid & bits)); @@ -798,9 +799,9 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa, CERROR("dirty %lu - %lu > dirty_max %lu\n", cli->cl_dirty, cli->cl_dirty_transit, cli->cl_dirty_max); oa->o_undirty = 0; - } else if (cfs_atomic_read(&obd_dirty_pages) - - cfs_atomic_read(&obd_dirty_transit_pages) > - obd_max_dirty_pages + 1){ + } else if (((diff = (cfs_atomic_read(&obd_dirty_pages) - + cfs_atomic_read(&obd_dirty_transit_pages))) > + obd_max_dirty_pages + 1) && (diff > 0)){ /* The cfs_atomic_read() allowing the cfs_atomic_inc() are * not covered by a lock thus they may safely race and trip * this CERROR() unless we add in a small fudge factor (+1). */ ================================================================================================================================================== |
| Comments |
| Comment by Bruno Faccini (Inactive) [ 09/Oct/12 ] |
|
Oops, sorry for the wrong formating, don't know what happen exactly but seems the starting +/- characters of each patch line did put the tool/inputs into some trouble ... Anyway, I can push my patch to master but I just want to get some feedback before ... |
| Comment by Andreas Dilger [ 09/Oct/12 ] |
|
Bruno, The Lustre coding style discourages the use of assignments inside conditionals, because it can lead to hard-to-find bugs in the code. Since we always expect the "diff" calculation to be needed, except in the very rare case that the first "if" condition is true, then it would be fine to compute "diff" before all of the checks. I think it would also be possible to avoid this error my casting (long)obd_max_dirty_pages so that it is doing a signed comparison. This avoids adding a second check into the code and an extra variable on the stack and I think makes it more clear. Also, I think while this code is being changed that a few more cleanups could be done here:
|
| Comment by Peter Jones [ 09/Oct/12 ] |
|
Bruno Thanks again for the suggested change. Could you please submit the next version of the code into gerrit so we can test and land it? Keith Could you please help see this through to landing? Thanks Peter |
| Comment by Bruno Faccini (Inactive) [ 10/Oct/12 ] |
|
Thank's for these hints Andreas !! My 1st patch version was definitelly to request feedback on the best/preferred way to fix In fact, the real question I had in mind was use the (cast) for obd_max_dirty_pages or But anyway, patch has been pushed to master as http://review.whamcloud.com/4246 I asked Andreas to review my patch for sure, but since it seems I need 2 reviewers, Keith can I ask you to review ?? |
| Comment by Patrick Farrell (Inactive) [ 25/Feb/13 ] |
|
I see this is in 2.4. This also applies to earlier 2.x versions - It might be worth pushing the patch to those as well. |
| Comment by Peter Jones [ 25/Feb/13 ] |
|
Landed for 2.4 and yes this is a candidate for inclusion in an upcoming 2.1.x release |