Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-1057

low performance maybe related to quota

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.3.0, Lustre 2.1.4
    • None
    • Lustre 2.1 with Bull patches, bullxlinux6.1 x86_64 (based on Redhat 6.1), server bullx S6010-4
    • 3
    • 4513

    Description

      When running a performance test (sequential data IOs, 15 tasks writing in one file each) on a Lustre file-system, installed with Lustre 2.1 plus a few Bull patches, I observe very low throughput compared to what I usually measure on the same hardware.

      Write bandwidth is varying between 150MB/s and 500 MB/s running with a standard user. With the exact same parameters and configuration, but running under the root user, I get around 2000 MB/s write bandwidth. This second value is what I observe usually.
      With the root user, I suppose the flag OBD_BRW_NOQUOTA is set (but I have not been able to confirm that from the source code), which makes the request processing skip the lquota_chkdq() quota check in osc_queue_async_io().

      The profiling of the Lustre client indicates more than 50% of time is spent in osc_quota_chkdq() routine. So this seems related to the quota subsystem and certainly explains why root user is not impacted by the problem. I will attach the profiling reports to this ticket.

      The Lustre client is a bullx S6010-4, which has 128 cores and a large NUMIOA factor. The same performance measure on a bullx S6010, which has only 32 cores and smaller NUMIOA factor, gives around 3000 MB/s write bandwidth, so it is not impacted by the performance issue.

      I have recompiled the lquota module after removing the cfs_spin_lock()/cfs_spin_unlock() calls on qinfo_list_lock in osc_quota_chkdq() routine and the performance is back to the expected level. Note that the qinfo_hash[] table in the Lustre client is empty since quota are disabled.

      How many asynchronous IO requests can be generated by only 15 writing tasks ? Is there so many requests in parallel that the qinfo_list_lock becomes a congestion point ?

      Is there more latency in the spin_lock()/spin_unlock() routines when the NUMIOA factor is high ?

      Attachments

        Activity

          [LU-1057] low performance maybe related to quota

          I have backported the patch into b2_1: http://review.whamcloud.com/#change,4184.

          The tests show the contention on quota (osc_quota_chkdq() routine) has been fixed.

          Could this patch been reviews ?

          Thanks.

          pichong Gregoire Pichon added a comment - I have backported the patch into b2_1: http://review.whamcloud.com/#change,4184 . The tests show the contention on quota (osc_quota_chkdq() routine) has been fixed. Could this patch been reviews ? Thanks.

          Please let me know if there is outstanding work on this ticket.

          jlevi Jodi Levi (Inactive) added a comment - Please let me know if there is outstanding work on this ticket.

          the patch has been merged (1b044fecb42c1f72ca2d2bc2bf80a4345b9ccf11)

          hongchao.zhang Hongchao Zhang added a comment - the patch has been merged (1b044fecb42c1f72ca2d2bc2bf80a4345b9ccf11)

          status update:

          the updated path using cfs_hash_t is under test.

          hongchao.zhang Hongchao Zhang added a comment - status update: the updated path using cfs_hash_t is under test.
          pjones Peter Jones added a comment -

          Reassign to Hongchao

          pjones Peter Jones added a comment - Reassign to Hongchao
          pjones Peter Jones added a comment -

          Yujian

          Could you please take care of this one?

          Thanks

          Peter

          pjones Peter Jones added a comment - Yujian Could you please take care of this one? Thanks Peter

          Support team can pick up and refresh Johann's last patch

          ian Ian Colle (Inactive) added a comment - Support team can pick up and refresh Johann's last patch

          Hi Johann,

          What is the status of this ticket ? Do you plan to provide a new version of the patch with hash table implementation ?

          This issue is going to become critical as many of these Bullx S6010-4 machines (with large NUMA factor) are being installed in the june/july timeframe at TGCC customer site.

          Thanks.

          pichong Gregoire Pichon added a comment - Hi Johann, What is the status of this ticket ? Do you plan to provide a new version of the patch with hash table implementation ? This issue is going to become critical as many of these Bullx S6010-4 machines (with large NUMA factor) are being installed in the june/july timeframe at TGCC customer site. Thanks.

          Please note that there was a bug in the patch:

          rc = radix_tree_insert(&cli->cl_quota_ids[type], qid[type], &oqi);
                                                                      ^^^^ this should be oqi
          

          I have pushed the corrected version. That said, it only shows up when you start using quota.

          johann Johann Lombardi (Inactive) added a comment - Please note that there was a bug in the patch: rc = radix_tree_insert(&cli->cl_quota_ids[type], qid[type], &oqi); ^^^^ this should be oqi I have pushed the corrected version. That said, it only shows up when you start using quota.

          Thanks for testing this patch Grégoire. I'm now waiting for autotest results to check if the patch broke quota

          johann Johann Lombardi (Inactive) added a comment - Thanks for testing this patch Grégoire. I'm now waiting for autotest results to check if the patch broke quota

          People

            hongchao.zhang Hongchao Zhang
            pichong Gregoire Pichon
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: