Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1057

low performance maybe related to quota

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.3.0, Lustre 2.1.4
    • None
    • Lustre 2.1 with Bull patches, bullxlinux6.1 x86_64 (based on Redhat 6.1), server bullx S6010-4
    • 3
    • 4513

    Description

      When running a performance test (sequential data IOs, 15 tasks writing in one file each) on a Lustre file-system, installed with Lustre 2.1 plus a few Bull patches, I observe very low throughput compared to what I usually measure on the same hardware.

      Write bandwidth is varying between 150MB/s and 500 MB/s running with a standard user. With the exact same parameters and configuration, but running under the root user, I get around 2000 MB/s write bandwidth. This second value is what I observe usually.
      With the root user, I suppose the flag OBD_BRW_NOQUOTA is set (but I have not been able to confirm that from the source code), which makes the request processing skip the lquota_chkdq() quota check in osc_queue_async_io().

      The profiling of the Lustre client indicates more than 50% of time is spent in osc_quota_chkdq() routine. So this seems related to the quota subsystem and certainly explains why root user is not impacted by the problem. I will attach the profiling reports to this ticket.

      The Lustre client is a bullx S6010-4, which has 128 cores and a large NUMIOA factor. The same performance measure on a bullx S6010, which has only 32 cores and smaller NUMIOA factor, gives around 3000 MB/s write bandwidth, so it is not impacted by the performance issue.

      I have recompiled the lquota module after removing the cfs_spin_lock()/cfs_spin_unlock() calls on qinfo_list_lock in osc_quota_chkdq() routine and the performance is back to the expected level. Note that the qinfo_hash[] table in the Lustre client is empty since quota are disabled.

      How many asynchronous IO requests can be generated by only 15 writing tasks ? Is there so many requests in parallel that the qinfo_list_lock becomes a congestion point ?

      Is there more latency in the spin_lock()/spin_unlock() routines when the NUMIOA factor is high ?

      Attachments

        Activity

          People

            hongchao.zhang Hongchao Zhang
            pichong Gregoire Pichon
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: