Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-8895

server grants clients with more grants that the clients ask

Details

    • 3
    • 9223372036854775807

    Description

      When there are several requests inflight a server may grant a client with more grants than it requests.

      In real environment it has been reported that clients were granted ~800mb while max_dirty_mb set to 256.

      $ cat compute_clients_cur_grant_bytes.txt  | grep OST019c | awk -F= 'BEGIN \{ grants=0 } \{ if (grants < $2) grants = $2 } END \{ print grants }'
      805306368
      
      

      In total clients were granted ~960gb

      $ cat compute_clients_cur_grant_bytes.txt  | grep OST019c | awk -F= '{ grants += $2 } END { print grants }'
      971171270656
      
      

      That resulted in ENOSPC condition when osts had about 1tb of free space.

      Attachments

        Issue Links

          Activity

            [LU-8895] server grants clients with more grants that the clients ask

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32907/
            Subject: LU-8895 target: limit grant allocation
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set:
            Commit: e800ee9409c4fae46e0d52f8d84a432a7f3ff428

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32907/ Subject: LU-8895 target: limit grant allocation Project: fs/lustre-release Branch: b2_10 Current Patch Set: Commit: e800ee9409c4fae46e0d52f8d84a432a7f3ff428

            Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32907
            Subject: LU-8895 target: limit grant allocation
            Project: fs/lustre-release
            Branch: b2_10
            Current Patch Set: 1
            Commit: f078f27f0de19c898e5eda2a45b6c33732a4e4ab

            gerrit Gerrit Updater added a comment - Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32907 Subject: LU-8895 target: limit grant allocation Project: fs/lustre-release Branch: b2_10 Current Patch Set: 1 Commit: f078f27f0de19c898e5eda2a45b6c33732a4e4ab
            pjones Peter Jones added a comment -

            Landed for 2.11

            pjones Peter Jones added a comment - Landed for 2.11

            Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24096/
            Subject: LU-8895 target: limit grant allocation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 82e494a36e9ea4f51ec163ab15beb9fdda7fa8d6

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24096/ Subject: LU-8895 target: limit grant allocation Project: fs/lustre-release Branch: master Current Patch Set: Commit: 82e494a36e9ea4f51ec163ab15beb9fdda7fa8d6

            I was not aware of that grant shrink is disabled by default, but no matter if there is a shrinking mechanism, I do agree that grant shouldn't be far too larger than dirty_max of client. I just don't think it's necessary to restrain it strictly less than dirty_max, exceeding dirty_max little bit looks not a problem to me.

            niu Niu Yawei (Inactive) added a comment - I was not aware of that grant shrink is disabled by default, but no matter if there is a shrinking mechanism, I do agree that grant shouldn't be far too larger than dirty_max of client. I just don't think it's necessary to restrain it strictly less than dirty_max, exceeding dirty_max little bit looks not a problem to me.

            I don't think it's a problem that 'granted' > 'max_dirty_mb'.

            ok, however the below code from ofd_grant_alloc() tries to not grant more than want:

                    if (curgrant >= want || curgrant >= fed->fed_grant + chunk)
                            RETURN(0);
            
            
            

            Note that want which clients send their rpcs with is equal to max_dirty_mb. See osc_announce_cached():

                            nrpages = cli->cl_max_pages_per_rpc;
                            nrpages *= cli->cl_max_rpcs_in_flight + 1;
                            nrpages = max(nrpages, cli->cl_dirty_max_pages);
                            oa->o_undirty = nrpages << PAGE_SHIFT;
            
            
            

             

            But that looks not a very serious problem to me, because we do have shrink mechanism to reclaim unused grant.

            Yes, as grant shrink is off currently (afaics), this quick patch limiting grants might allow to delay occurrence of virtual enospc condition. If the grant shrink is going to be turned on soon, then this patch is not needed.

            vsaveliev Vladimir Saveliev added a comment - I don't think it's a problem that 'granted' > 'max_dirty_mb'. ok, however the below code from ofd_grant_alloc() tries to not grant more than want : if (curgrant >= want || curgrant >= fed->fed_grant + chunk) RETURN(0); Note that want which clients send their rpcs with is equal to max_dirty_mb. See osc_announce_cached(): nrpages = cli->cl_max_pages_per_rpc; nrpages *= cli->cl_max_rpcs_in_flight + 1; nrpages = max(nrpages, cli->cl_dirty_max_pages); oa->o_undirty = nrpages << PAGE_SHIFT;   But that looks not a very serious problem to me, because we do have shrink mechanism to reclaim unused grant. Yes, as grant shrink is off currently (afaics), this quick patch limiting grants might allow to delay occurrence of virtual enospc condition. If the grant shrink is going to be turned on soon, then this patch is not needed.

            I don't think it's a problem that 'granted' > 'max_dirty_mb'. In my opinion, 'max_dirty_mb' is used to limit the dirty pages on client, and 'granted' is used to reserve space on server, so that to make sure later dirty flush won't failed for ENOSPC, it'll be fine as long as 'granted' >= 'dirty'. I don't see why that can result in ENOSPC error.

            When a client is actively consuming grant, it's reasonable to grant more space for this client (to avoid the client running out of grant), probably we need to throttle the granting speed to make sure not granting too much? But that looks not a very serious problem to me, because we do have shrink mechanism to reclaim unused grant.

            niu Niu Yawei (Inactive) added a comment - I don't think it's a problem that 'granted' > 'max_dirty_mb'. In my opinion, 'max_dirty_mb' is used to limit the dirty pages on client, and 'granted' is used to reserve space on server, so that to make sure later dirty flush won't failed for ENOSPC, it'll be fine as long as 'granted' >= 'dirty'. I don't see why that can result in ENOSPC error. When a client is actively consuming grant, it's reasonable to grant more space for this client (to avoid the client running out of grant), probably we need to throttle the granting speed to make sure not granting too much? But that looks not a very serious problem to me, because we do have shrink mechanism to reclaim unused grant.

            Vladimir Saveliev (vladimir.saveliev@seagate.com) uploaded a new patch: https://review.whamcloud.com/24096
            Subject: LU-8895 ofd: do not grant more than asked
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: ad1a4cfae54b235c853951ed2ca50588a6fa5b40

            gerrit Gerrit Updater added a comment - Vladimir Saveliev (vladimir.saveliev@seagate.com) uploaded a new patch: https://review.whamcloud.com/24096 Subject: LU-8895 ofd: do not grant more than asked Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: ad1a4cfae54b235c853951ed2ca50588a6fa5b40

            People

              vsaveliev Vladimir Saveliev
              vsaveliev Vladimir Saveliev
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: