Details
-
Bug
-
Resolution: Fixed
-
Major
-
None
-
Lustre 2.1 with Bull patches, bullxlinux6.1 x86_64 (based on Redhat 6.1), server bullx S6010-4
-
3
-
4513
Description
When running a performance test (sequential data IOs, 15 tasks writing in one file each) on a Lustre file-system, installed with Lustre 2.1 plus a few Bull patches, I observe very low throughput compared to what I usually measure on the same hardware.
Write bandwidth is varying between 150MB/s and 500 MB/s running with a standard user. With the exact same parameters and configuration, but running under the root user, I get around 2000 MB/s write bandwidth. This second value is what I observe usually.
With the root user, I suppose the flag OBD_BRW_NOQUOTA is set (but I have not been able to confirm that from the source code), which makes the request processing skip the lquota_chkdq() quota check in osc_queue_async_io().
The profiling of the Lustre client indicates more than 50% of time is spent in osc_quota_chkdq() routine. So this seems related to the quota subsystem and certainly explains why root user is not impacted by the problem. I will attach the profiling reports to this ticket.
The Lustre client is a bullx S6010-4, which has 128 cores and a large NUMIOA factor. The same performance measure on a bullx S6010, which has only 32 cores and smaller NUMIOA factor, gives around 3000 MB/s write bandwidth, so it is not impacted by the performance issue.
I have recompiled the lquota module after removing the cfs_spin_lock()/cfs_spin_unlock() calls on qinfo_list_lock in osc_quota_chkdq() routine and the performance is back to the expected level. Note that the qinfo_hash[] table in the Lustre client is empty since quota are disabled.
How many asynchronous IO requests can be generated by only 15 writing tasks ? Is there so many requests in parallel that the qinfo_list_lock becomes a congestion point ?
Is there more latency in the spin_lock()/spin_unlock() routines when the NUMIOA factor is high ?