[LU-1057] low performance maybe related to quota Created: 31/Jan/12 Updated: 22/Dec/12 Resolved: 27/Sep/12 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | Lustre 2.3.0, Lustre 2.1.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Gregoire Pichon | Assignee: | Hongchao Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | paj | ||
| Environment: |
Lustre 2.1 with Bull patches, bullxlinux6.1 x86_64 (based on Redhat 6.1), server bullx S6010-4 |
||
| Attachments: |
|
| Severity: | 3 |
| Rank (Obsolete): | 4513 |
| Description |
|
When running a performance test (sequential data IOs, 15 tasks writing in one file each) on a Lustre file-system, installed with Lustre 2.1 plus a few Bull patches, I observe very low throughput compared to what I usually measure on the same hardware. Write bandwidth is varying between 150MB/s and 500 MB/s running with a standard user. With the exact same parameters and configuration, but running under the root user, I get around 2000 MB/s write bandwidth. This second value is what I observe usually. The profiling of the Lustre client indicates more than 50% of time is spent in osc_quota_chkdq() routine. So this seems related to the quota subsystem and certainly explains why root user is not impacted by the problem. I will attach the profiling reports to this ticket. The Lustre client is a bullx S6010-4, which has 128 cores and a large NUMIOA factor. The same performance measure on a bullx S6010, which has only 32 cores and smaller NUMIOA factor, gives around 3000 MB/s write bandwidth, so it is not impacted by the performance issue. I have recompiled the lquota module after removing the cfs_spin_lock()/cfs_spin_unlock() calls on qinfo_list_lock in osc_quota_chkdq() routine and the performance is back to the expected level. Note that the qinfo_hash[] table in the Lustre client is empty since quota are disabled. How many asynchronous IO requests can be generated by only 15 writing tasks ? Is there so many requests in parallel that the qinfo_list_lock becomes a congestion point ? Is there more latency in the spin_lock()/spin_unlock() routines when the NUMIOA factor is high ? |
| Comments |
| Comment by Johann Lombardi (Inactive) [ 31/Jan/12 ] |
|
To speed up the case when quota isn't enforced (like in this case), we could just record the number of osc_quota_info entries we have for each cli and skip the hash lookup as well as the locking entirely. When quota is enforced, i think we should first have one hash per cli instead of a global hash and spinlock. |
| Comment by Gregoire Pichon [ 31/Jan/12 ] |
|
Here are the oprofile reports for
|
| Comment by Peter Jones [ 31/Jan/12 ] |
|
Niu Could you please look into this one? Thanks Peter |
| Comment by Johann Lombardi (Inactive) [ 31/Jan/12 ] |
|
Actually, we might be able to just use a radix tree with RCU |
| Comment by Johann Lombardi (Inactive) [ 31/Jan/12 ] |
|
I have just pushed a - untested - patch using RCU & radix tree: |
| Comment by Gregoire Pichon [ 02/Feb/12 ] |
|
Thank you Johann. I have tested your patch (set 2) and results are good. The performance is at the expected level and the profiling report does not show too much time spent in osc_quota_chkdq() routine (0.0170% of the profiling samples). Note that my configuration still has quota disabled and therefore there are no osc_quota_info entries. |
| Comment by Johann Lombardi (Inactive) [ 02/Feb/12 ] |
|
Thanks for testing this patch Grégoire. I'm now waiting for autotest results to check if the patch broke quota |
| Comment by Johann Lombardi (Inactive) [ 04/Feb/12 ] |
|
Please note that there was a bug in the patch: rc = radix_tree_insert(&cli->cl_quota_ids[type], qid[type], &oqi);
^^^^ this should be oqi
I have pushed the corrected version. That said, it only shows up when you start using quota. |
| Comment by Gregoire Pichon [ 21/Jun/12 ] |
|
Hi Johann, What is the status of this ticket ? Do you plan to provide a new version of the patch with hash table implementation ? This issue is going to become critical as many of these Bullx S6010-4 machines (with large NUMA factor) are being installed in the june/july timeframe at TGCC customer site. Thanks. |
| Comment by Ian Colle (Inactive) [ 04/Jul/12 ] |
|
Support team can pick up and refresh Johann's last patch |
| Comment by Peter Jones [ 04/Jul/12 ] |
|
Yujian Could you please take care of this one? Thanks Peter |
| Comment by Peter Jones [ 05/Jul/12 ] |
|
Reassign to Hongchao |
| Comment by Hongchao Zhang [ 09/Jul/12 ] |
|
status update: the updated path using cfs_hash_t is under test. |
| Comment by Hongchao Zhang [ 05/Aug/12 ] |
|
the patch has been merged (1b044fecb42c1f72ca2d2bc2bf80a4345b9ccf11) |
| Comment by Jodi Levi (Inactive) [ 27/Sep/12 ] |
|
Please let me know if there is outstanding work on this ticket. |
| Comment by Gregoire Pichon [ 04/Oct/12 ] |
|
I have backported the patch into b2_1: http://review.whamcloud.com/#change,4184. The tests show the contention on quota (osc_quota_chkdq() routine) has been fixed. Could this patch been reviews ? Thanks. |