Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-2127

Confusing messages from osc_announce_cached() caused by unchecked signed/unsigned comparison

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.4.0
    • None
    • 3
    • 5129

    Description

      Some examples, taken for convenience from LU-1153 logs but which are very frequent on T100 site :
      =================================================================================================

      Feb 28 09:37:28 compute-01-32 kernel: LustreError: 2567:0:(osc_request.c:797:osc_announce_cached()) dirty 2028 - 2028 > system dirty_max 647168
      Feb 28 09:37:40 compute-01-32 kernel: LustreError: 2568:0:(osc_request.c:797:osc_announce_cached()) dirty 2040 - 2040 > system dirty_max 647168
      Feb 28 09:37:46 compute-01-32 kernel: LustreError: 2565:0:(osc_request.c:797:osc_announce_cached()) dirty 1954 - 1954 > system dirty_max 647168
      Feb 28 09:37:47 compute-01-32 kernel: LustreError: 2561:0:(osc_request.c:797:osc_announce_cached()) dirty 1975 - 1976 > system dirty_max 647168
      Feb 28 09:37:50 compute-01-32 kernel: LustreError: 2566:0:(osc_request.c:797:osc_announce_cached()) dirty 2001 - 2002 > system dirty_max 647168
      Feb 28 09:38:09 compute-01-32 kernel: LustreError: 2567:0:(osc_request.c:797:osc_announce_cached()) dirty 1946 - 1946 > system dirty_max 647168
      Feb 28 09:39:20 compute-01-32 kernel: LustreError: 2615:0:(osc_request.c:797:osc_announce_cached()) dirty 2017 - 2017 > system dirty_max 647168
      Feb 28 09:39:47 compute-01-32 kernel: LustreError: 2615:0:(osc_request.c:797:osc_announce_cached()) dirty 1847 - 1848 > system dirty_max 647168
      

      =================================================================================================

      this comes from the following piece of code :
      =============================================

      .       } else if (cfs_atomic_read(&obd_dirty_pages) -
                         cfs_atomic_read(&obd_dirty_transit_pages) >
                         obd_max_dirty_pages + 1){
                      /* The cfs_atomic_read() allowing the cfs_atomic_inc() are
                       * not covered by a lock thus they may safely race and trip
                       * this CERROR() unless we add in a small fudge factor (+1). */
                      CERROR("dirty %d - %d > system dirty_max %d\n",
                             cfs_atomic_read(&obd_dirty_pages),
                             cfs_atomic_read(&obd_dirty_transit_pages),
                             obd_max_dirty_pages);
      

      =============================================

      where both obd_dirty_pages/obd_dirty_transit_pages are "likely to change" signed ints and obd_max_dirty_pages unsigned which an easily explain the wrong error msgs ...

      This is still true in current master and, if I am right and there is no tricky stuff I may have missed there ..., the following patch should fix :
      ==================================================================================================================================================

      diff --git a/lustre/osc/osc_request.c b/lustre/osc/osc_request.c
      index af60e4b..5f75329 100644
      --- a/lustre/osc/osc_request.c
      +++ b/lustre/osc/osc_request.c
      @@ -788,6 +788,7 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
                                       long writing_bytes)
       {
               obd_flag bits = OBD_MD_FLBLOCKS|OBD_MD_FLGRANT;
      +        int diff;
       
               LASSERT(!(oa->o_valid & bits));
       
      @@ -798,9 +799,9 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
                       CERROR("dirty %lu - %lu > dirty_max %lu\n",
                              cli->cl_dirty, cli->cl_dirty_transit, cli->cl_dirty_max);
                       oa->o_undirty = 0;
      -        } else if (cfs_atomic_read(&obd_dirty_pages) -
      -                   cfs_atomic_read(&obd_dirty_transit_pages) >
      -                   obd_max_dirty_pages + 1){
      +        } else if (((diff = (cfs_atomic_read(&obd_dirty_pages) -
      +                   cfs_atomic_read(&obd_dirty_transit_pages))) >
      +                   obd_max_dirty_pages + 1) && (diff > 0)){
                       /* The cfs_atomic_read() allowing the cfs_atomic_inc() are
                        * not covered by a lock thus they may safely race and trip
                        * this CERROR() unless we add in a small fudge factor (+1). */
      

      ==================================================================================================================================================

      Attachments

        Activity

          People

            keith Keith Mannthey (Inactive)
            louveta Alexandre Louvet (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: