Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-7015

Grant space and reserved blocks percent parameters

Details

    • Question/Request
    • Resolution: Fixed
    • Minor
    • Lustre 2.8.0
    • Lustre 2.5.4
    • None
    • RHEL-6.6, lustre-2.5.4
    • 9223372036854775807

    Description

      Our system space utilization on one of our systems is high, and as we work to prune some of this data, we're exploring some other space tunings.

      One of our admins noted the "cur_grant_bytes" osc parameter. When we looked at a few clients, we saw that this variable often exceeds the max_dirty_mb, sometimes by an order of magnitude. We usually use 64MB of dirty cache per osc per client. Is there an upper limit to this cur_grants_bytes parameter? What are the side effects of setting this value to some lower value (or 0)? Can we reduce this client grant while there is active I/O, and can we do this for all osc connections simultaneously (for a collective of millions of osc connections) for a system? Is this documented well anywhere?

      Additionally, we are looking into tuning the reserved_blocks_percent parameter. The Lustre manual states that 5% is the minimum, but is that a sane value for all OST sizes?

      Thanks,

      Jesse

      Attachments

        Issue Links

          Activity

            [LU-7015] Grant space and reserved blocks percent parameters

            Landed for 2.8.0

            jgmitter Joseph Gmitter (Inactive) added a comment - Landed for 2.8.0

            Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16216/
            Subject: LU-7015 ofd: Fix wanted grant calculation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 091988499717c729f8870b331ab3774b249d5818

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/16216/ Subject: LU-7015 ofd: Fix wanted grant calculation Project: fs/lustre-release Branch: master Current Patch Set: Commit: 091988499717c729f8870b331ab3774b249d5818

            Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/16216
            Subject: LU-7015 ofd: Fix wanted grant calculation
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 22c2ad105d9d420058f653f03030ce2e4a3f017b

            gerrit Gerrit Updater added a comment - Oleg Drokin (oleg.drokin@intel.com) uploaded a new patch: http://review.whamcloud.com/16216 Subject: LU-7015 ofd: Fix wanted grant calculation Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 22c2ad105d9d420058f653f03030ce2e4a3f017b

            The upper limit for wanted grant is typically at least 2x max_dirty_mb, so by having max_dirty_mb=64M doubles the potential amount of grant per client. With 22,000 clients that will result in a large amount of grant, as you have seen. Presumably the change to max_dirty_mb=64M was done to improve single-client write performance and/or increase writeback caching on the client without blocking the IO?

            Unfortunately, without the grant shrinker active, writing to cur_grant_bytes will not permanently affect the amount of grant held by the client unless the filesystem is nearly out of space. Otherwise, the client will try to surrender the grant but the server will reply that there is still grant available and return the same amount back. Only when the available space begins to get constrained will the OST not return the grant, but this can be done on clients as an emergency measure when available space is running short, something like:

            pdsh -a <clients> lctl set_param osc.*.cur_grant_bytes=2M
            

            (or even 1MB if necessary) and then clients which do not need any grant will not get any more.

            The grant shrinking code had problems when it was first introduced (before 2.0 was released) and has never been fixed since then.

            adilger Andreas Dilger added a comment - The upper limit for wanted grant is typically at least 2x max_dirty_mb, so by having max_dirty_mb=64M doubles the potential amount of grant per client. With 22,000 clients that will result in a large amount of grant, as you have seen. Presumably the change to max_dirty_mb=64M was done to improve single-client write performance and/or increase writeback caching on the client without blocking the IO? Unfortunately, without the grant shrinker active, writing to cur_grant_bytes will not permanently affect the amount of grant held by the client unless the filesystem is nearly out of space. Otherwise, the client will try to surrender the grant but the server will reply that there is still grant available and return the same amount back. Only when the available space begins to get constrained will the OST not return the grant, but this can be done on clients as an emergency measure when available space is running short, something like: pdsh -a <clients> lctl set_param osc.*.cur_grant_bytes=2M (or even 1MB if necessary) and then clients which do not need any grant will not get any more. The grant shrinking code had problems when it was first introduced (before 2.0 was released) and has never been fixed since then.
            ezell Matt Ezell added a comment -

            I just ran a quick test on our TDS system. I took a newly mounted client and created 50 files striped across OST 0. I backgrounded 50 dd processes against those files and gathered logs with +cache enabled on the client and server.

            The first thing I noticed it that the server very quickly increased the grant to the client, maybe even before the client had a chance to realize it.

            00002000:00000020:4.0:1439910325.382841:0:36645:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:4.0:1439910325.383027:0:31053:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:4.0:1439910325.383615:0:36646:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:4.0:1439910325.383775:0:36647:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:4.0:1439910325.384272:0:36648:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:4.0:1439910325.385007:0:36649:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:4.0:1439910325.385154:0:36650:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:6.0:1439910325.416668:0:36648:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 335872 granting: 8388608
            00002000:00000020:5.0:1439910325.417207:0:36649:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:6.0:1439910325.417262:0:36645:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608
            00002000:00000020:6.0:1439910325.433766:0:31053:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 29917184 granting: 8388608
            00002000:00000020:4.0:1439910325.433773:0:36646:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 8187904 granting: 8388608
            00002000:00000020:5.0:1439910325.433789:0:31052:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 22822912 granting: 8388608
            00002000:00000020:6.0:1439910325.434528:0:31054:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 25923584 granting: 8388608
            00002000:00000020:4.0:1439910325.434534:0:36647:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 25845760 granting: 8388608
            00002000:00000020:5.0:1439910325.591676:0:36650:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 32403456 granting: 8388608
            00002000:00000020:4.0:1439910325.591852:0:36652:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 32382976 granting: 8388608
            00002000:00000020:5.0:1439910325.591860:0:36647:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 32608256 granting: 8388608
            00002000:00000020:6.0:1439910325.593790:0:31054:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 30371840 granting: 8388608
            00002000:00000020:5.0:1439910325.595378:0:36651:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 29700096 granting: 8388608
            00002000:00000020:4.0:1439910325.595384:0:31052:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 29696000 granting: 8388608
            

            The server granted it 56MB before the client even reported having a grant.

            I haven't read all of the grant-related code, so take this analysis with a grain of salt.

            Is the want parameter supposed to be an absolute or relative value?

            lustre/ofd/ofd_grant.c:ofd_grant()
                    /* Grant some fraction of the client's requested grant space so that
                     * they are not always waiting for write credits (not all of it to
                     * avoid overgranting in face of multiple RPCs in flight).  This
                     * essentially will be able to control the OSC_MAX_RIF for a client.
                     *
                     * If we do have a large disparity between what the client thinks it
                     * has and what we think it has, don't grant very much and let the
                     * client consume its grant first.  Either it just has lots of RPCs
                     * in flight, or it was evicted and its grants will soon be used up. */
                    if (curgrant >= want || curgrant >= fed->fed_grant + grant_chunk)
                               RETURN(0);
            

            This looks like want is being used as an absolute value. Assuming want should be absolute, do we also need a check to ensure that fed->fed_grant isn't much larger than want?

            lustre/ofd/ofd_grant.c:ofd_grant()
            grant = min(want, left);
            ...
                    /* Limit to ofd_grant_chunk() if not reconnect/recovery */
                    if ((grant > grant_chunk) && conservative)
                            grant = grant_chunk;
            ...
                    ofd->ofd_tot_granted += grant;
                    fed->fed_grant += grant;
            

            This looks like want is a relative value.

            So the clients repeatedly says "I want 32MB" and the server takes that request, lowers it to grant_chunk (8MB), and grants it 8MB repeatedly until the client claims it has at least 32MB.

            According to Andreas in LU-3859, OBD_CONNECT_GRANT_SHRINK isn't set, so this is never cleaned up automatically. Is there a reason this is disabled?

            ezell Matt Ezell added a comment - I just ran a quick test on our TDS system. I took a newly mounted client and created 50 files striped across OST 0. I backgrounded 50 dd processes against those files and gathered logs with +cache enabled on the client and server. The first thing I noticed it that the server very quickly increased the grant to the client, maybe even before the client had a chance to realize it. 00002000:00000020:4.0:1439910325.382841:0:36645:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:4.0:1439910325.383027:0:31053:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:4.0:1439910325.383615:0:36646:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:4.0:1439910325.383775:0:36647:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:4.0:1439910325.384272:0:36648:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:4.0:1439910325.385007:0:36649:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:4.0:1439910325.385154:0:36650:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:6.0:1439910325.416668:0:36648:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 335872 granting: 8388608 00002000:00000020:5.0:1439910325.417207:0:36649:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:6.0:1439910325.417262:0:36645:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 0 granting: 8388608 00002000:00000020:6.0:1439910325.433766:0:31053:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 29917184 granting: 8388608 00002000:00000020:4.0:1439910325.433773:0:36646:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 8187904 granting: 8388608 00002000:00000020:5.0:1439910325.433789:0:31052:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 22822912 granting: 8388608 00002000:00000020:6.0:1439910325.434528:0:31054:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 25923584 granting: 8388608 00002000:00000020:4.0:1439910325.434534:0:36647:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 25845760 granting: 8388608 00002000:00000020:5.0:1439910325.591676:0:36650:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 32403456 granting: 8388608 00002000:00000020:4.0:1439910325.591852:0:36652:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 32382976 granting: 8388608 00002000:00000020:5.0:1439910325.591860:0:36647:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 32608256 granting: 8388608 00002000:00000020:6.0:1439910325.593790:0:31054:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 30371840 granting: 8388608 00002000:00000020:5.0:1439910325.595378:0:36651:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 29700096 granting: 8388608 00002000:00000020:4.0:1439910325.595384:0:31052:0:(ofd_grant.c:662:ofd_grant()) atlastds-OST0000: cli ce2fb5d0-e502-410d-675d-3b8d0dd26305/ffff880806fe3c00 wants: 33554432 current grant 29696000 granting: 8388608 The server granted it 56MB before the client even reported having a grant. I haven't read all of the grant-related code, so take this analysis with a grain of salt. Is the want parameter supposed to be an absolute or relative value? lustre/ofd/ofd_grant.c:ofd_grant() /* Grant some fraction of the client's requested grant space so that * they are not always waiting for write credits (not all of it to * avoid overgranting in face of multiple RPCs in flight). This * essentially will be able to control the OSC_MAX_RIF for a client. * * If we do have a large disparity between what the client thinks it * has and what we think it has, don't grant very much and let the * client consume its grant first. Either it just has lots of RPCs * in flight, or it was evicted and its grants will soon be used up. */ if (curgrant >= want || curgrant >= fed->fed_grant + grant_chunk) RETURN(0); This looks like want is being used as an absolute value. Assuming want should be absolute, do we also need a check to ensure that fed->fed_grant isn't much larger than want ? lustre/ofd/ofd_grant.c:ofd_grant() grant = min(want, left); ... /* Limit to ofd_grant_chunk() if not reconnect/recovery */ if ((grant > grant_chunk) && conservative) grant = grant_chunk; ... ofd->ofd_tot_granted += grant; fed->fed_grant += grant; This looks like want is a relative value. So the clients repeatedly says "I want 32MB" and the server takes that request, lowers it to grant_chunk (8MB), and grants it 8MB repeatedly until the client claims it has at least 32MB. According to Andreas in LU-3859 , OBD_CONNECT_GRANT_SHRINK isn't set, so this is never cleaned up automatically. Is there a reason this is disabled?
            ezell Matt Ezell added a comment -

            It looks like ofd_grant_space_left() uses ofd->ofd_osfs.os_bavail, so it appears to take the reserved space into account.

            ezell Matt Ezell added a comment - It looks like ofd_grant_space_left() uses ofd->ofd_osfs.os_bavail, so it appears to take the reserved space into account.
            ezell Matt Ezell added a comment -

            I guess until we get usage down or a patch for this, we will need to periodically shrink grants on clients to avoid ENOSPC.

            The source of the question about reserved space was to better understand when a user might get ENOSPC. Would it be when a client has exhausted its grant and (kbytesfree - (tot_granted/1024)) <= 0 or does it use (kbytesavail - (tot_granted/1024)) <= 0 ?

            The Lustre Operations manual has a pretty strong warning about lowering the reserved space:

            Reducing the space reservation can cause severe performance degradation as the OST file system becomes more than 95% full, due to difficulty in locating large areas of contiguous free space. This performance degradation may persist even if the space usage drops below 95% again. It is recommended NOT to reduce the reserved disk space below 5%.

            But if that will give us a little headroom, we know grants will help keep us from getting too close to completely empty.

            ezell Matt Ezell added a comment - I guess until we get usage down or a patch for this, we will need to periodically shrink grants on clients to avoid ENOSPC. The source of the question about reserved space was to better understand when a user might get ENOSPC. Would it be when a client has exhausted its grant and (kbytesfree - (tot_granted/1024)) <= 0 or does it use (kbytesavail - (tot_granted/1024)) <= 0 ? The Lustre Operations manual has a pretty strong warning about lowering the reserved space: Reducing the space reservation can cause severe performance degradation as the OST file system becomes more than 95% full, due to difficulty in locating large areas of contiguous free space. This performance degradation may persist even if the space usage drops below 95% again. It is recommended NOT to reduce the reserved disk space below 5%. But if that will give us a little headroom, we know grants will help keep us from getting too close to completely empty.

            People

              green Oleg Drokin
              hanleyja Jesse Hanley
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: