[LU-2513] typo in shrink osc grants Created: 20/Dec/12 Updated: 22/Apr/13 Resolved: 22/Apr/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.1.0, Lustre 2.2.0, Lustre 2.3.0, Lustre 2.4.0 |
| Fix Version/s: | Lustre 2.4.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexey Lyashkov | Assignee: | John Hammond |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | LB | ||
| Severity: | 3 |
| Rank (Obsolete): | 5913 |
| Description |
|
static int osc_shrink_grant(struct client_obd *cli) client_obd_list_lock(&cli->cl_loi_list_lock); return osc_shrink_grant_to_target(cli, target); that function looks have a typo - cl_avail_grant in bytes, but target will be in pages so << CFS_PAGE_SHIFT looks lost. |
| Comments |
| Comment by Alexey Lyashkov [ 20/Dec/12 ] |
|
looks also in osc_shrink_grant_to_target client_obd_list_lock(&cli->cl_loi_list_lock);
/* Don't shrink if we are already above or below the desired limit
* We don't want to shrink below a single RPC, as that will negatively
* impact block allocation and long-term performance. */
if (target < cli->cl_max_pages_per_rpc)
target = cli->cl_max_pages_per_rpc;
if (target >= cli->cl_avail_grant) {
client_obd_list_unlock(&cli->cl_loi_list_lock);
RETURN(0);
}
client_obd_list_unlock(&cli->cl_loi_list_lock);
OBD_ALLOC_PTR(body);
if (!body)
RETURN(-ENOMEM);
osc_announce_cached(cli, &body->oa, 0);
client_obd_list_lock(&cli->cl_loi_list_lock);
body->oa.o_grant = cli->cl_avail_grant - target;
cli->cl_avail_grant = target;
but.. osc_cache.c: if (cli->cl_avail_grant >= bytes) {
osc_cache.c: cli->cl_avail_grant -= bytes;
so we really in bytes.. |
| Comment by Alexey Lyashkov [ 21/Dec/12 ] |
|
one more bug in same function int osc_shrink_grant_to_target(struct client_obd *cli, long target) { int rc = 0; struct ost_body *body; ENTRY; client_obd_list_lock(&cli->cl_loi_list_lock); /* Don't shrink if we are already above or below the desired limit * We don't want to shrink below a single RPC, as that will negatively * impact block allocation and long-term performance. */ if (target < cli->cl_max_pages_per_rpc) target = cli->cl_max_pages_per_rpc; if (target >= cli->cl_avail_grant) { client_obd_list_unlock(&cli->cl_loi_list_lock); RETURN(0); } client_obd_list_unlock(&cli->cl_loi_list_lock); OBD_ALLOC_PTR(body); if (!body) RETURN(-ENOMEM); osc_announce_cached(cli, &body->oa, 0); client_obd_list_lock(&cli->cl_loi_list_lock); body->oa.o_grant = cli->cl_avail_grant - target; so we have a release a lock and may sleep for any time and may have negative result of cl_avail_grant - target after it. second question - is this operation 'atomic' over client + ost. |
| Comment by Alexey Lyashkov [ 29/Dec/12 ] |
|
few more bugs with grants. with simple test [root@rhel6-64 tests]# OSTCOUNT=1 DEBUG_SIZE=400 SUBSYSTEM="ost osc filter" PTLDEBUG=-1 sh llmount.sh for i in {1..25000}; do dd if=/dev/zero of=lustre/$i bs=4096 count=2 if [ $? -ne 0 ]; then break; fi; done lctl dk > /tmp/grant.log 3) send up to max rpc in flight grant requests in parallel /reproduced with 2.1 but exist in 2.4/ 82297.783585:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33603584 granting: 0 82297.783601:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33603584 granting: 0 82297.802358:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576 82297.802404:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576 82297.802757:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576 82297.829384:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 41869312 granting: 0 with that example - client got a 2 extra grant updates. but we really don't need a have grants more then dirty cache size, because we have stick in waiting dirty space. |
| Comment by Alexey Lyashkov [ 06/Jan/13 ] |
|
Xyratex-Bug-Id: MRP-809. Peter, I have fix for it issue, and have plan to submit in short time. |
| Comment by Alexey Lyashkov [ 06/Jan/13 ] |
|
2.1 need additional chunk diff --git a/lustre/obdfilter/filter_io.c b/lustre/obdfilter/filter_io.c /* Rough calc in case we don't refresh cached statfs data */
|
| Comment by Alexey Lyashkov [ 14/Jan/13 ] |
|
remote: New Changes: |
| Comment by Alexey Lyashkov [ 14/Jan/13 ] |
|
if 2.1 fix need - i may submit it also. |
| Comment by Andreas Dilger [ 20/Feb/13 ] |
|
John, can you please submit a patch that only fixes the bytes-vs-pages issue independent of the "number of rpcs in flight" change that is blocking the current patch from landing. |
| Comment by John Hammond [ 20/Feb/13 ] |
|
Please see http://review.whamcloud.com/5495. |
| Comment by John Hammond [ 01/Apr/13 ] |
|
Patch landed. |
| Comment by Alexey Lyashkov [ 02/Apr/13 ] |
|
Andreas, my last version have dropped an limiting a number of rpc from a patch, and i don't understand a situation why it's don't inspected and new patch created? I really don't understand... |
| Comment by Andreas Dilger [ 03/Apr/13 ] |
|
Shadow, you submitted the original patch on Jan 11 and I inspected it on Jan 17, then Jinshan inspected it again on feb 11. I wanted to make sure that at least the basic fix was landed, so a month after not hearing from you (Feb 20) I asked John to submit the basic fix without changing the whole grant protocol. In cases like this it is easier to separate the basic fix (incompatible units for grant variables) from a much more major change that is controversial. You didn't refresh the original the patch until March 20. I agree that John should have resubmitted his patch using the same Change-Id as your original patch, since this makes it a lot easier to reinspect and compare the two patches. That would also have made it more clear to you that there was a new version of your fix . Sorry for the confusion. |