few more bugs with grants.
1) wrong rounding to block size (fixed in 2.4).
2)
[ 933.237133] LustreError: 29642:0:(ofd_grant.c:250:ofd_grant_space_left()) lustre-OST0000: cli lustre-OST0000_UUID/ffff8800a72809a0 left 62275584 < tot_grant 65467236 unstable 0 pending 0
Dec 28 07:45:39 rhel6-64 kernel: [ 933.237133] LustreError: 29642:0:(ofd_grant.c:250:ofd_grant_space_left()) lustre-OST0000: cli lustre-OST0000_UUID/ffff8800a72809a0 left 62275584 < tot_grant 65467236 unstable 0 pending 0
with simple test
[root@rhel6-64 tests]# OSTCOUNT=1 DEBUG_SIZE=400 SUBSYSTEM="ost osc filter" PTLDEBUG=-1 sh llmount.sh
for i in {1..25000}; do
dd if=/dev/zero of=lustre/$i bs=4096 count=2
if [ $? -ne 0 ]; then
break;
fi;
done
lctl dk > /tmp/grant.log
3) send up to max rpc in flight grant requests in parallel /reproduced with 2.1 but exist in 2.4/
82297.783585:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33603584 granting: 0
82297.783601:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33603584 granting: 0
82297.802358:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576
82297.802404:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576
82297.802757:0:5373:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 33546240 granting: 1048576
82297.829384:0:5381:0:(filter_io.c:249:filter_grant()) lustre-OST0000: cli 846bbe16-0fc2-273c-ff5f-ff1e24116f3f/ffff8800908e1438 wants: 33554432 current grant 41869312 granting: 0
with that example - client got a 2 extra grant updates.
but we really don't need a have grants more then dirty cache size, because we have stick in waiting dirty space.
limiting a grant request don't need to be to max rpc in flight, because that is more related to dirty cache when network transfer.
Shadow, you submitted the original patch on Jan 11 and I inspected it on Jan 17, then Jinshan inspected it again on feb 11. I wanted to make sure that at least the basic fix was landed, so a month after not hearing from you (Feb 20) I asked John to submit the basic fix without changing the whole grant protocol. In cases like this it is easier to separate the basic fix (incompatible units for grant variables) from a much more major change that is controversial. You didn't refresh the original the patch until March 20.
I agree that John should have resubmitted his patch using the same Change-Id as your original patch, since this makes it a lot easier to reinspect and compare the two patches. That would also have made it more clear to you that there was a new version of your fix . Sorry for the confusion.