What I noticed is when I was over my softlimit using cp all the I/O was 4KB RPCs. I was able to see this happen in the middle of my test as I would go over my softlimit the RPCs would drop from 1MB to 4KB. Also using IOR with buffered I/O all the RPCs where 4K. It seems that the smaller I/O sizes are the main issue.
For quota accuracy, when approaching (or over) quota hardlimit (or softlimit), client turns to sync write (see bug16642), and in consequence the PRC size will be page size, 4K (page can't be cached on client, it has to be synced out on write).
I think it is unintentional that over softlimit IO is done in 4kB chunks, even if the qunit is getting 1MB chunks. Is it possible to avoid throttling the clients if there is a large gap between the soft and hard quota limit (i.e. treating over softlimit the same as under softlimit if there is still a large margin before the hardlimit)?
As I described above, page size (4KB) io is because of sync write on client. To avoid sync write on client after over softlimit, I think probably we can tweak the qunit size differently when over softlimit. I'll try to cooke a patch.
ok Mahmoud