Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-5812

osc_request.c:853:osc_announce_cached() dirty 129051 + 1573 - 0 > system dirty_max 130608

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • None
    • Lustre 2.5.3
    • IBM BG/Q system vulcan
      Lustre client Build Version: 2.5.3-0.13morrone-0.13morrone--PRISTINE-2.6.32-431.1.1.bgq.3blueos.V1R2M2.bl2.2_4.ppc64
      Lustre server Build Version: 2.4.2-17chaos-17chaos--PRISTINE-2.6.32-431.29.2.1chaos.ch5.2.x86_64
    • 3
    • 16298

    Description

      Intermittent occurrences of the following in console logs:

      2014-10-25 04:32:08.953847 {RMP22Oc150844422} [mmcs]{0}.7.0: LustreError: 3262:0:(osc_request.c:853:osc_announce_cached()) fsv-OST0045-osc-c0000003e09df1c0: dirty 129051 + 1573 - 0 > system dirty_max 130608
      2014-10-25 04:32:08.954403 {RMP22Oc150844422} [mmcs]{0}.7.0: LustreError: 3262:0:(osc_request.c:853:osc_announce_cached()) Skipped 152 previous similar messages
      

      Attachments

        1. R00-ID-J00.log.lustre
          29 kB
        2. R00-ID-J01.log.lustre
          4 kB
        3. R00-ID-J02.log.lustre
          32 kB
        4. R00-ID-J03.log.lustre
          5 kB

        Activity

          [LU-5812] osc_request.c:853:osc_announce_cached() dirty 129051 + 1573 - 0 > system dirty_max 130608

          The 2.12.6 has the same issue.

          ieelusername Homer Li (Inactive) added a comment - The 2.12.6 has the same issue.
          ofaaland Olaf Faaland added a comment -

          Although this was never resolved, the impact of this issue is low and we are moving our clients from Lustre 2.5 to 2.8. Closing.

          ofaaland Olaf Faaland added a comment - Although this was never resolved, the impact of this issue is low and we are moving our clients from Lustre 2.5 to 2.8. Closing.

          If the first one is not exact, it is likely because that is not the most recent version of the patch. You have no way to know that from the information presented on line, unfortunately.

          morrone Christopher Morrone (Inactive) added a comment - If the first one is not exact, it is likely because that is not the most recent version of the patch. You have no way to know that from the information presented on line, unfortunately.
          ofaaland Olaf Faaland added a comment - - edited

          Marc, Chris,

          It appears to me that some of the patches are not in our lustre 2.5.5-3chaos stack. Details:

          http://review.whamcloud.com/12604       partial         * 9722ebf LU-2139 osc: Track and limit "unstable" pages
          http://review.whamcloud.com/12605       full            * 666430c LU-2139 osc: Track number of "unstable" pages per osc
          http://review.whamcloud.com/12606       full            * 534ef35 LU-2139 osc: Use SOFT_SYNC to urge server commit
          http://review.whamcloud.com/12612       full            * 003f186 LU-2139 ofd: Do async commit if SOFT_SYNC is seen
          http://review.whamcloud.com/12613       not found
          http://review.whamcloud.com/12615       not found
          

          where "partial" means some of the files changed in the patch shown on gerritt were not changed by the patch in the 2.5.5-3chaos stack.

          -Olaf

          ofaaland Olaf Faaland added a comment - - edited Marc, Chris, It appears to me that some of the patches are not in our lustre 2.5.5-3chaos stack. Details: http://review.whamcloud.com/12604 partial * 9722ebf LU-2139 osc: Track and limit "unstable" pages http://review.whamcloud.com/12605 full * 666430c LU-2139 osc: Track number of "unstable" pages per osc http://review.whamcloud.com/12606 full * 534ef35 LU-2139 osc: Use SOFT_SYNC to urge server commit http://review.whamcloud.com/12612 full * 003f186 LU-2139 ofd: Do async commit if SOFT_SYNC is seen http://review.whamcloud.com/12613 not found http://review.whamcloud.com/12615 not found where "partial" means some of the files changed in the patch shown on gerritt were not changed by the patch in the 2.5.5-3chaos stack. -Olaf

          Olaf, is this in our local releases?

          marc@llnl.gov D. Marc Stearman (Inactive) added a comment - Olaf, is this in our local releases?
          yujian Jian Yu added a comment -

          Thank you, Jinshan. The sixth patch http://review.whamcloud.com/12615 contains the above change.

          yujian Jian Yu added a comment - Thank you, Jinshan. The sixth patch http://review.whamcloud.com/12615 contains the above change.

          Yes, that looks good.

          jay Jinshan Xiong (Inactive) added a comment - Yes, that looks good.
          yujian Jian Yu added a comment -

          The sixth patch from http://review.whamcloud.com/10003 heavily depends on the patches for LU-3321, which would not likely to be back-ported to Lustre b2_5 branch.

          Hi Jinshan,

          With applying the first 5 patches on Lustre b2_5 branch, for the sixth one, may I just made the following change so as to quiet the error message in this ticket?

          diff --git a/lustre/osc/osc_request.c b/lustre/osc/osc_request.c
          index 1c42033..dcfc660 100644
          --- a/lustre/osc/osc_request.c
          +++ b/lustre/osc/osc_request.c
          @@ -839,16 +839,14 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa,
                                 cli->cl_dirty_pages, cli->cl_dirty_transit,
                                 cli->cl_dirty_max_pages);
                          oa->o_undirty = 0;
          -       } else if (unlikely(cfs_atomic_read(&obd_unstable_pages) +
          -                           cfs_atomic_read(&obd_dirty_pages) -
          +       } else if (unlikely(cfs_atomic_read(&obd_dirty_pages) -
                                      cfs_atomic_read(&obd_dirty_transit_pages) >
                                      (long)(obd_max_dirty_pages + 1))) {
                          /* The cfs_atomic_read() allowing the cfs_atomic_inc() are
                           * not covered by a lock thus they may safely race and trip
                           * this CERROR() unless we add in a small fudge factor (+1). */
          -               CERROR("%s: dirty %d + %d - %d > system dirty_max %d\n",
          +               CERROR("%s: dirty %d - %d > system dirty_max %d\n",
                                 cli->cl_import->imp_obd->obd_name,
          -                      cfs_atomic_read(&obd_unstable_pages),
                                 cfs_atomic_read(&obd_dirty_pages),
                                 cfs_atomic_read(&obd_dirty_transit_pages),
                                 obd_max_dirty_pages);
          
          yujian Jian Yu added a comment - The sixth patch from http://review.whamcloud.com/10003 heavily depends on the patches for LU-3321 , which would not likely to be back-ported to Lustre b2_5 branch. Hi Jinshan, With applying the first 5 patches on Lustre b2_5 branch, for the sixth one, may I just made the following change so as to quiet the error message in this ticket? diff --git a/lustre/osc/osc_request.c b/lustre/osc/osc_request.c index 1c42033..dcfc660 100644 --- a/lustre/osc/osc_request.c +++ b/lustre/osc/osc_request.c @@ -839,16 +839,14 @@ static void osc_announce_cached(struct client_obd *cli, struct obdo *oa, cli->cl_dirty_pages, cli->cl_dirty_transit, cli->cl_dirty_max_pages); oa->o_undirty = 0; - } else if (unlikely(cfs_atomic_read(&obd_unstable_pages) + - cfs_atomic_read(&obd_dirty_pages) - + } else if (unlikely(cfs_atomic_read(&obd_dirty_pages) - cfs_atomic_read(&obd_dirty_transit_pages) > ( long )(obd_max_dirty_pages + 1))) { /* The cfs_atomic_read() allowing the cfs_atomic_inc() are * not covered by a lock thus they may safely race and trip * this CERROR() unless we add in a small fudge factor (+1). */ - CERROR( "%s: dirty %d + %d - %d > system dirty_max %d\n" , + CERROR( "%s: dirty %d - %d > system dirty_max %d\n" , cli->cl_import->imp_obd->obd_name, - cfs_atomic_read(&obd_unstable_pages), cfs_atomic_read(&obd_dirty_pages), cfs_atomic_read(&obd_dirty_transit_pages), obd_max_dirty_pages);
          yujian Jian Yu added a comment -

          The first 5 patches are ready to land. The sixth patch needs to be re-implemented on Lustre b2_5 branch.

          yujian Jian Yu added a comment - The first 5 patches are ready to land. The sixth patch needs to be re-implemented on Lustre b2_5 branch.
          yujian Jian Yu added a comment - Here are the back-ported patches for Lustre b2_5 branch: http://review.whamcloud.com/12604 (from http://review.whamcloud.com/6284 ) http://review.whamcloud.com/12605 (from http://review.whamcloud.com/4374 ) http://review.whamcloud.com/12606 (from http://review.whamcloud.com/4375 ) http://review.whamcloud.com/12612 (from http://review.whamcloud.com/5935 ) http://review.whamcloud.com/12613 (from http://review.whamcloud.com/8215 ) http://review.whamcloud.com/12615 (from http://review.whamcloud.com/10003 ) With the above patches, the unstable pages tracking and counting issues were fixed.

          People

            yujian Jian Yu
            ofaaland Olaf Faaland
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: