[LU-8895] server grants clients with more grants that the clients ask Created: 02/Dec/16 Updated: 21/Jan/19 Resolved: 04/Jan/18 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.11.0, Lustre 2.10.7 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Vladimir Saveliev | Assignee: | Vladimir Saveliev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Severity: | 3 | ||||||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||||||
| Description |
|
When there are several requests inflight a server may grant a client with more grants than it requests. In real environment it has been reported that clients were granted ~800mb while max_dirty_mb set to 256. $ cat compute_clients_cur_grant_bytes.txt | grep OST019c | awk -F= 'BEGIN \{ grants=0 } \{ if (grants < $2) grants = $2 } END \{ print grants }'
805306368
In total clients were granted ~960gb $ cat compute_clients_cur_grant_bytes.txt | grep OST019c | awk -F= '{ grants += $2 } END { print grants }'
971171270656
That resulted in ENOSPC condition when osts had about 1tb of free space. |
| Comments |
| Comment by Gerrit Updater [ 02/Dec/16 ] |
|
Vladimir Saveliev (vladimir.saveliev@seagate.com) uploaded a new patch: https://review.whamcloud.com/24096 |
| Comment by Niu Yawei (Inactive) [ 05/Dec/16 ] |
|
I don't think it's a problem that 'granted' > 'max_dirty_mb'. In my opinion, 'max_dirty_mb' is used to limit the dirty pages on client, and 'granted' is used to reserve space on server, so that to make sure later dirty flush won't failed for ENOSPC, it'll be fine as long as 'granted' >= 'dirty'. I don't see why that can result in ENOSPC error. When a client is actively consuming grant, it's reasonable to grant more space for this client (to avoid the client running out of grant), probably we need to throttle the granting speed to make sure not granting too much? But that looks not a very serious problem to me, because we do have shrink mechanism to reclaim unused grant. |
| Comment by Vladimir Saveliev [ 14/Dec/16 ] |
ok, however the below code from ofd_grant_alloc() tries to not grant more than want: if (curgrant >= want || curgrant >= fed->fed_grant + chunk)
RETURN(0);
Note that want which clients send their rpcs with is equal to max_dirty_mb. See osc_announce_cached(): nrpages = cli->cl_max_pages_per_rpc;
nrpages *= cli->cl_max_rpcs_in_flight + 1;
nrpages = max(nrpages, cli->cl_dirty_max_pages);
oa->o_undirty = nrpages << PAGE_SHIFT;
Yes, as grant shrink is off currently (afaics), this quick patch limiting grants might allow to delay occurrence of virtual enospc condition. If the grant shrink is going to be turned on soon, then this patch is not needed. |
| Comment by Niu Yawei (Inactive) [ 15/Dec/16 ] |
|
I was not aware of that grant shrink is disabled by default, but no matter if there is a shrinking mechanism, I do agree that grant shouldn't be far too larger than dirty_max of client. I just don't think it's necessary to restrain it strictly less than dirty_max, exceeding dirty_max little bit looks not a problem to me. |
| Comment by Gerrit Updater [ 04/Jan/18 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch https://review.whamcloud.com/24096/ |
| Comment by Peter Jones [ 04/Jan/18 ] |
|
Landed for 2.11 |
| Comment by Gerrit Updater [ 31/Jul/18 ] |
|
Andreas Dilger (adilger@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/32907 |
| Comment by Gerrit Updater [ 19/Jan/19 ] |
|
Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/32907/ |