Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-4139

Significant perforamce issue when user over soft quota limit

Details

    • 4
    • 11230

    Description

      When a user goes over their softlimit there is a major performace hit.

      Testing showed a file copied in 3sec when under softlimit and 7 Min when over softlimit.

      Can be reproduced by just testing below and over softlimit.

      see trace for when the copy was slow.

      Attachments

        Issue Links

          Activity

            [LU-4139] Significant perforamce issue when user over soft quota limit
            niu Niu Yawei (Inactive) added a comment - backported to b2_5: http://review.whamcloud.com/8603
            pjones Peter Jones added a comment -

            ok Mahmoud

            pjones Peter Jones added a comment - ok Mahmoud

            Please close this one.

            mhanafi Mahmoud Hanafi added a comment - Please close this one.
            yujian Jian Yu added a comment - - edited

            Patch http://review.whamcloud.com/8078 landed on master branch and was cherry-picked to Lustre b2_4 branch.

            yujian Jian Yu added a comment - - edited Patch http://review.whamcloud.com/8078 landed on master branch and was cherry-picked to Lustre b2_4 branch.

            New benchmark number with the patch

            Direct I/O
            UnderSoftlimit: 383MB/sec
            OverSoftlimit: 359MB/sec

            Buffered I/O
            UnderSoftlimit:316MB.sec
            OverSoftlimit: 304MB/sec

            So it looks good!

            mhanafi Mahmoud Hanafi added a comment - New benchmark number with the patch Direct I/O UnderSoftlimit: 383MB/sec OverSoftlimit: 359MB/sec Buffered I/O UnderSoftlimit:316MB.sec OverSoftlimit: 304MB/sec So it looks good!

            How does this patch help with the 4k io sizes? I think that is the real issue with the performance.

            4K io size is caused by over quota flag on client, with this patch, slave can acquire/pre-acquire little bit more spare limit each time when over softlimit, then over quota flag won't be set on client anymore.

            niu Niu Yawei (Inactive) added a comment - How does this patch help with the 4k io sizes? I think that is the real issue with the performance. 4K io size is caused by over quota flag on client, with this patch, slave can acquire/pre-acquire little bit more spare limit each time when over softlimit, then over quota flag won't be set on client anymore.

            How does this patch help with the 4k io sizes? I think that is the real issue with the performance.

            mhanafi Mahmoud Hanafi added a comment - How does this patch help with the 4k io sizes? I think that is the real issue with the performance.

            Lose some grace time accuracy to improve write performance when over softlimit: http://review.whamcloud.com/8078

            niu Niu Yawei (Inactive) added a comment - Lose some grace time accuracy to improve write performance when over softlimit: http://review.whamcloud.com/8078

            While the servers run 2.4.1 the clients are 2.1.5. The client code has no knowledge of the new quota rules. Which variable/field enforce sync write, and how the server tells clients to start using sync write? I found where the qunit is adjusted, but I have not figured out how the sync write was enforced.

            The new quota didn't change client protocol, so triggering sync write when approaching limit is same as before, please check the server code qsd_op_begin0():

                                    __u64   usage;
            
                                    lqe_read_lock(lqe);
                                    usage  = lqe->lqe_usage;
                                    usage += lqe->lqe_pending_write;
                                    usage += lqe->lqe_waiting_write;
                                    usage += qqi->qqi_qsd->qsd_sync_threshold;
            
                                    /* if we should notify client to start sync write */
                                    if (usage >= lqe->lqe_granted - lqe->lqe_pending_rel)
                                            *flags |= LQUOTA_OVER_FL(qqi->qqi_qtype);
                                    else
                                            *flags &= ~LQUOTA_OVER_FL(qqi->qqi_qtype);
                                    lqe_read_unlock(lqe);
            

            And the client code osc_queue_async_io() -> osc_quota_chkdq().

            niu Niu Yawei (Inactive) added a comment - While the servers run 2.4.1 the clients are 2.1.5. The client code has no knowledge of the new quota rules. Which variable/field enforce sync write, and how the server tells clients to start using sync write? I found where the qunit is adjusted, but I have not figured out how the sync write was enforced. The new quota didn't change client protocol, so triggering sync write when approaching limit is same as before, please check the server code qsd_op_begin0(): __u64 usage; lqe_read_lock(lqe); usage = lqe->lqe_usage; usage += lqe->lqe_pending_write; usage += lqe->lqe_waiting_write; usage += qqi->qqi_qsd->qsd_sync_threshold; /* if we should notify client to start sync write */ if (usage >= lqe->lqe_granted - lqe->lqe_pending_rel) *flags |= LQUOTA_OVER_FL(qqi->qqi_qtype); else *flags &= ~LQUOTA_OVER_FL(qqi->qqi_qtype); lqe_read_unlock(lqe); And the client code osc_queue_async_io() -> osc_quota_chkdq().

            While the servers run 2.4.1 the clients are 2.1.5. The client code has no knowledge of the new quota rules. Which variable/field enforce sync write, and how the server tells clients to start using sync write? I found where the qunit is adjusted, but I have not figured out how the sync write was enforced.

            jaylan Jay Lan (Inactive) added a comment - While the servers run 2.4.1 the clients are 2.1.5. The client code has no knowledge of the new quota rules. Which variable/field enforce sync write, and how the server tells clients to start using sync write? I found where the qunit is adjusted, but I have not figured out how the sync write was enforced.

            People

              niu Niu Yawei (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: