Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.4.0
    • Lustre 2.1.0, Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0
    • 3
    • 5913

    Description

      static int osc_shrink_grant(struct client_obd *cli)
      {
      long target = (cli->cl_max_rpcs_in_flight + 1) *
      cli->cl_max_pages_per_rpc;

      client_obd_list_lock(&cli->cl_loi_list_lock);
      if (cli->cl_avail_grant <= target)
      target = cli->cl_max_pages_per_rpc;
      client_obd_list_unlock(&cli->cl_loi_list_lock);

      return osc_shrink_grant_to_target(cli, target);
      }

      that function looks have a typo - cl_avail_grant in bytes, but target will be in pages so << CFS_PAGE_SHIFT looks lost.

      Attachments

        Activity

          [LU-2513] typo in shrink osc grants

          Shadow, you submitted the original patch on Jan 11 and I inspected it on Jan 17, then Jinshan inspected it again on feb 11. I wanted to make sure that at least the basic fix was landed, so a month after not hearing from you (Feb 20) I asked John to submit the basic fix without changing the whole grant protocol. In cases like this it is easier to separate the basic fix (incompatible units for grant variables) from a much more major change that is controversial. You didn't refresh the original the patch until March 20.

          I agree that John should have resubmitted his patch using the same Change-Id as your original patch, since this makes it a lot easier to reinspect and compare the two patches. That would also have made it more clear to you that there was a new version of your fix . Sorry for the confusion.

          adilger Andreas Dilger added a comment - Shadow, you submitted the original patch on Jan 11 and I inspected it on Jan 17, then Jinshan inspected it again on feb 11. I wanted to make sure that at least the basic fix was landed, so a month after not hearing from you (Feb 20) I asked John to submit the basic fix without changing the whole grant protocol. In cases like this it is easier to separate the basic fix (incompatible units for grant variables) from a much more major change that is controversial. You didn't refresh the original the patch until March 20. I agree that John should have resubmitted his patch using the same Change-Id as your original patch, since this makes it a lot easier to reinspect and compare the two patches. That would also have made it more clear to you that there was a new version of your fix . Sorry for the confusion.

          Andreas, my last version have dropped an limiting a number of rpc from a patch, and i don't understand a situation why it's don't inspected and new patch created?
          It's hard to inspect a existent patch instead of create own?

          I really don't understand...

          shadow Alexey Lyashkov added a comment - Andreas, my last version have dropped an limiting a number of rpc from a patch, and i don't understand a situation why it's don't inspected and new patch created? It's hard to inspect a existent patch instead of create own? I really don't understand...
          jhammond John Hammond added a comment -

          Patch landed.

          jhammond John Hammond added a comment - Patch landed.
          jhammond John Hammond added a comment - - edited
          jhammond John Hammond added a comment - - edited Please see http://review.whamcloud.com/5495 .

          John, can you please submit a patch that only fixes the bytes-vs-pages issue independent of the "number of rpcs in flight" change that is blocking the current patch from landing.

          adilger Andreas Dilger added a comment - John, can you please submit a patch that only fixes the bytes-vs-pages issue independent of the "number of rpcs in flight" change that is blocking the current patch from landing.

          if 2.1 fix need - i may submit it also.

          shadow Alexey Lyashkov added a comment - if 2.1 fix need - i may submit it also.

          remote: New Changes:
          remote: http://review.whamcloud.com/5017

          shadow Alexey Lyashkov added a comment - remote: New Changes: remote: http://review.whamcloud.com/5017

          2.1 need additional chunk

          diff --git a/lustre/obdfilter/filter_io.c b/lustre/obdfilter/filter_io.c
          index bb2b8d1..5096950 100644
          — a/lustre/obdfilter/filter_io.c
          +++ b/lustre/obdfilter/filter_io.c
          @@ -603,7 +603,7 @@ static int filter_grant_check(struct obd_export *exp, struct obdo *oa,
          ungranted, fed->fed_grant, fed->fed_dirty);

          /* Rough calc in case we don't refresh cached statfs data */

          • using = (used + ungranted + 1 ) >>
            + using = (used + ungranted + blocksize - 1) >>
            exp->exp_obd->u.obt.obt_sb->s_blocksize_bits;
            if (exp->exp_obd->obd_osfs.os_bavail > using)
            exp->exp_obd->obd_osfs.os_bavail -= using;
          shadow Alexey Lyashkov added a comment - 2.1 need additional chunk diff --git a/lustre/obdfilter/filter_io.c b/lustre/obdfilter/filter_io.c index bb2b8d1..5096950 100644 — a/lustre/obdfilter/filter_io.c +++ b/lustre/obdfilter/filter_io.c @@ -603,7 +603,7 @@ static int filter_grant_check(struct obd_export *exp, struct obdo *oa, ungranted, fed->fed_grant, fed->fed_dirty); /* Rough calc in case we don't refresh cached statfs data */ using = (used + ungranted + 1 ) >> + using = (used + ungranted + blocksize - 1) >> exp->exp_obd->u.obt.obt_sb->s_blocksize_bits; if (exp->exp_obd->obd_osfs.os_bavail > using) exp->exp_obd->obd_osfs.os_bavail -= using;

          Xyratex-Bug-Id: MRP-809.

          Peter,

          I have fix for it issue, and have plan to submit in short time.
          Fix touched client side only.

          shadow Alexey Lyashkov added a comment - Xyratex-Bug-Id: MRP-809. Peter, I have fix for it issue, and have plan to submit in short time. Fix touched client side only.

          few more bugs with grants.
          1) wrong rounding to block size (fixed in 2.4).
          2)
          [ 933.237133] LustreError: 29642:0:(ofd_grant.c:250:ofd_grant_space_left()) lustre-OST0000: cli lustre-OST0000_UUID/ffff8800a72809a0 left 62275584 < tot_grant 65467236 unstable 0 pending 0
          Dec 28 07:45:39 rhel6-64 kernel: [ 933.237133] LustreError: 29642:0:(ofd_grant.c:250:ofd_grant_space_left()) lustre-OST0000: cli lustre-OST0000_UUID/ffff8800a72809a0 left 62275584 < tot_grant 65467236 unstable 0 pending 0

          with simple test

          [root@rhel6-64 tests]# OSTCOUNT=1 DEBUG_SIZE=400 SUBSYSTEM="ost osc filter" PTLDEBUG=-1 sh llmount.sh 
          for i in {1..25000}; do
              dd if=/dev/zero of=lustre/$i bs=4096 count=2
              if [ $? -ne 0 ]; then
                  break;
              fi;
          done
          lctl dk > /tmp/grant.log
          

          3) send up to max rpc in flight grant requests in parallel /reproduced with 2.1 but exist in 2.4/

          82297.783585:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33603584 granting: 0
          82297.783601:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33603584 granting: 0
          82297.802358:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576
          82297.802404:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576
          82297.802757:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576
          82297.829384:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 41869312 granting: 0
          

          with that example - client got a 2 extra grant updates.

          but we really don't need a have grants more then dirty cache size, because we have stick in waiting dirty space.
          limiting a grant request don't need to be to max rpc in flight, because that is more related to dirty cache when network transfer.

          shadow Alexey Lyashkov added a comment - few more bugs with grants. 1) wrong rounding to block size (fixed in 2.4). 2) [ 933.237133] LustreError: 29642:0:(ofd_grant.c:250:ofd_grant_space_left()) lustre-OST0000: cli lustre-OST0000_UUID/ffff8800a72809a0 left 62275584 < tot_grant 65467236 unstable 0 pending 0 Dec 28 07:45:39 rhel6-64 kernel: [ 933.237133] LustreError: 29642:0:(ofd_grant.c:250:ofd_grant_space_left()) lustre-OST0000: cli lustre-OST0000_UUID/ffff8800a72809a0 left 62275584 < tot_grant 65467236 unstable 0 pending 0 with simple test [root@rhel6-64 tests]# OSTCOUNT=1 DEBUG_SIZE=400 SUBSYSTEM= "ost osc filter" PTLDEBUG=-1 sh llmount.sh for i in {1..25000}; do dd if =/dev/zero of=lustre/$i bs=4096 count=2 if [ $? -ne 0 ]; then break ; fi; done lctl dk > /tmp/grant.log 3) send up to max rpc in flight grant requests in parallel /reproduced with 2.1 but exist in 2.4/ 82297.783585:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33603584 granting: 0 82297.783601:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33603584 granting: 0 82297.802358:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576 82297.802404:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576 82297.802757:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576 82297.829384:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 41869312 granting: 0 with that example - client got a 2 extra grant updates. but we really don't need a have grants more then dirty cache size, because we have stick in waiting dirty space. limiting a grant request don't need to be to max rpc in flight, because that is more related to dirty cache when network transfer.

          People

            jhammond John Hammond
            shadow Alexey Lyashkov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: