[LU-8895] server grants clients with more grants that the clients ask Created: 02/Dec/16  Updated: 21/Jan/19  Resolved: 04/Jan/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.11.0, Lustre 2.10.7

Type: Bug Priority: Minor
Reporter: Vladimir Saveliev Assignee: Vladimir Saveliev
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
Related
is related to LU-8708 Grant shrinking disabled all the time Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

When there are several requests inflight a server may grant a client with more grants than it requests.

In real environment it has been reported that clients were granted ~800mb while max_dirty_mb set to 256.

$ cat compute_clients_cur_grant_bytes.txt  | grep OST019c | awk -F= 'BEGIN \{ grants=0 } \{ if (grants < $2) grants = $2 } END \{ print grants }'
805306368

In total clients were granted ~960gb

$ cat compute_clients_cur_grant_bytes.txt  | grep OST019c | awk -F= '{ grants += $2 } END { print grants }'
971171270656

That resulted in ENOSPC condition when osts had about 1tb of free space.



 Comments   
Comment by Gerrit Updater [ 02/Dec/16 ]

Vladimir Saveliev (vladimir.saveliev@seagate.com) uploaded a new patch: https://review.whamcloud.com/24096
Subject: LU-8895 ofd: do not grant more than asked
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ad1a4cfae54b235c853951ed2ca50588a6fa5b40

Comment by Niu Yawei (Inactive) [ 05/Dec/16 ]

I don't think it's a problem that 'granted' > 'max_dirty_mb'. In my opinion, 'max_dirty_mb' is used to limit the dirty pages on client, and 'granted' is used to reserve space on server, so that to make sure later dirty flush won't failed for ENOSPC, it'll be fine as long as 'granted' >= 'dirty'. I don't see why that can result in ENOSPC error.

When a client is actively consuming grant, it's reasonable to grant more space for this client (to avoid the client running out of grant), probably we need to throttle the granting speed to make sure not granting too much? But that looks not a very serious problem to me, because we do have shrink mechanism to reclaim unused grant.

Comment by Vladimir Saveliev [ 14/Dec/16 ]

I don't think it's a problem that 'granted' > 'max_dirty_mb'.

ok, however the below code from ofd_grant_alloc() tries to not grant more than want:

        if (curgrant >= want || curgrant >= fed->fed_grant + chunk)
                RETURN(0);


Note that want which clients send their rpcs with is equal to max_dirty_mb. See osc_announce_cached():

                nrpages = cli->cl_max_pages_per_rpc;
                nrpages *= cli->cl_max_rpcs_in_flight + 1;
                nrpages = max(nrpages, cli->cl_dirty_max_pages);
                oa->o_undirty = nrpages << PAGE_SHIFT;


 

But that looks not a very serious problem to me, because we do have shrink mechanism to reclaim unused grant.

Yes, as grant shrink is off currently (afaics), this quick patch limiting grants might allow to delay occurrence of virtual enospc condition. If the grant shrink is going to be turned on soon, then this patch is not needed.

Comment by Niu Yawei (Inactive) [ 15/Dec/16 ]

I was not aware of that grant shrink is disabled by default, but no matter if there is a shrinking mechanism, I do agree that grant shouldn't be far too larger than dirty_max of client. I just don't think it's necessary to restrain it strictly less than dirty_max, exceeding dirty_max little bit looks not a problem to me.

Comment by Gerrit Updater [ 04/Jan/18 ]

Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24096/
Subject: LU-8895 target: limit grant allocation
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 82e494a36e9ea4f51ec163ab15beb9fdda7fa8d6

Comment by Peter Jones [ 04/Jan/18 ]

Landed for 2.11

Comment by Gerrit Updater [ 31/Jul/18 ]

Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32907
Subject: LU-8895 target: limit grant allocation
Project: fs/lustre-release
Branch: b2_10
Current Patch Set: 1
Commit: f078f27f0de19c898e5eda2a45b6c33732a4e4ab

Comment by Gerrit Updater [ 19/Jan/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32907/
Subject: LU-8895 target: limit grant allocation
Project: fs/lustre-release
Branch: b2_10
Current Patch Set:
Commit: e800ee9409c4fae46e0d52f8d84a432a7f3ff428

Generated at Sat Feb 10 02:21:28 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.