[LU-4139] Significant perforamce issue when user over soft quota limit Created: 23/Oct/13 Updated: 04/Dec/19 Resolved: 03/Dec/13 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.4.1 |
| Fix Version/s: | Lustre 2.6.0, Lustre 2.4.2, Lustre 2.5.1 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Mahmoud Hanafi | Assignee: | Niu Yawei (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Severity: | 4 | ||||||||
| Rank (Obsolete): | 11230 | ||||||||
| Description |
|
When a user goes over their softlimit there is a major performace hit. Testing showed a file copied in 3sec when under softlimit and 7 Min when over softlimit. Can be reproduced by just testing below and over softlimit. see trace for when the copy was slow. |
| Comments |
| Comment by Peter Jones [ 23/Oct/13 ] |
|
Niu Could you please comment on this one? Thanks Peter |
| Comment by Niu Yawei (Inactive) [ 24/Oct/13 ] |
|
Hi, Mahmoud This is by design but not a bug. To acheive an accurate grace time management, we have to shrink the qunit size to the least qunit size (1K), then quota slave can only acquire 1K quota limit from master each time, that definitely hurts performance. For good performance, large qunit has to be kept when approaching softlimit, however, grace timer would not be triggered (or stopped) exactly on user's real disk usage, I think that would be really bad user experience. We think the performance when over softlimit (or approaching hardlimit) is less important comparing with an accurate grace timer (or an accurate -EDQUOT), so we choose to sacrifice the performance (in certain special case) for the accuracy. This is the reason of dynamic qunit introduced actually. |
| Comment by Andreas Dilger [ 24/Oct/13 ] |
|
Niu, when you write "1k qunit" is that really 1kB of data, or 1024 blocks? It doesn't make sense to only have 1kB of quota if the blocks are allocated in chunks of 4kB. |
| Comment by Mahmoud Hanafi [ 24/Oct/13 ] |
|
3sec to 7 min is not a very good trade off. This behaviour makes softlimit useless for us. Is there a way we can tune how much min qunit is allocated if a user is over their softlimit. |
| Comment by Jay Lan (Inactive) [ 24/Oct/13 ] |
|
The qsd_internal.h showed the meaning of (qunit == 1024) as: |
| Comment by Niu Yawei (Inactive) [ 25/Oct/13 ] |
Right, it is 1K blocks (1M).
After over softlimit, client turns to sync write instead of writing to cache, and quota slave only acquires minimum quota limit which can just satisfy the incoming write each time, that means two RPC round-trip delay would be added to each write operation, but I don't think the performance gap will be so big (3sec to 7min), could you verify that all data was really flushed back (for the 3sec copy)? Or could you retry the test with direct io to see the difference? Thanks. |
| Comment by Mahmoud Hanafi [ 25/Oct/13 ] |
|
Here is what my testing showed: Using IOR with directio 3 threads writing to single ost: Using IOR with buffered IO 3 threads writing to singe ost: using 'cp' to copy a 10G file What I noticed is when I was over my softlimit using cp all the I/O was 4KB RPCs. I was able to see this happen in the middle of my test as I would go over my softlimit the RPCs would drop from 1MB to 4KB. Also using IOR with buffered I/O all the RPCs where 4K. It seems that the smaller I/O sizes are the main issue. MY IOR COMMANDS Direct IO |
| Comment by Andreas Dilger [ 26/Oct/13 ] |
|
Niu, |
| Comment by Niu Yawei (Inactive) [ 28/Oct/13 ] |
For quota accuracy, when approaching (or over) quota hardlimit (or softlimit), client turns to sync write (see bug16642), and in consequence the PRC size will be page size, 4K (page can't be cached on client, it has to be synced out on write).
As I described above, page size (4KB) io is because of sync write on client. To avoid sync write on client after over softlimit, I think probably we can tweak the qunit size differently when over softlimit. I'll try to cooke a patch. |
| Comment by Jay Lan (Inactive) [ 28/Oct/13 ] |
|
While the servers run 2.4.1 the clients are 2.1.5. The client code has no knowledge of the new quota rules. Which variable/field enforce sync write, and how the server tells clients to start using sync write? I found where the qunit is adjusted, but I have not figured out how the sync write was enforced. |
| Comment by Niu Yawei (Inactive) [ 28/Oct/13 ] |
The new quota didn't change client protocol, so triggering sync write when approaching limit is same as before, please check the server code qsd_op_begin0(): __u64 usage;
lqe_read_lock(lqe);
usage = lqe->lqe_usage;
usage += lqe->lqe_pending_write;
usage += lqe->lqe_waiting_write;
usage += qqi->qqi_qsd->qsd_sync_threshold;
/* if we should notify client to start sync write */
if (usage >= lqe->lqe_granted - lqe->lqe_pending_rel)
*flags |= LQUOTA_OVER_FL(qqi->qqi_qtype);
else
*flags &= ~LQUOTA_OVER_FL(qqi->qqi_qtype);
lqe_read_unlock(lqe);
And the client code osc_queue_async_io() -> osc_quota_chkdq(). |
| Comment by Niu Yawei (Inactive) [ 28/Oct/13 ] |
|
Lose some grace time accuracy to improve write performance when over softlimit: http://review.whamcloud.com/8078 |
| Comment by Mahmoud Hanafi [ 28/Oct/13 ] |
|
How does this patch help with the 4k io sizes? I think that is the real issue with the performance. |
| Comment by Niu Yawei (Inactive) [ 29/Oct/13 ] |
4K io size is caused by over quota flag on client, with this patch, slave can acquire/pre-acquire little bit more spare limit each time when over softlimit, then over quota flag won't be set on client anymore. |
| Comment by Mahmoud Hanafi [ 30/Oct/13 ] |
|
New benchmark number with the patch Direct I/O Buffered I/O So it looks good! |
| Comment by Jian Yu [ 22/Nov/13 ] |
|
Patch http://review.whamcloud.com/8078 landed on master branch and was cherry-picked to Lustre b2_4 branch. |
| Comment by Mahmoud Hanafi [ 03/Dec/13 ] |
|
Please close this one. |
| Comment by Peter Jones [ 03/Dec/13 ] |
|
ok Mahmoud |
| Comment by Niu Yawei (Inactive) [ 17/Dec/13 ] |
|
backported to b2_5: http://review.whamcloud.com/8603 |