Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9704

ofd_grant_check() claims GRANT, real grant 0

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.12.8, Lustre 2.15.0
    • Lustre 2.9.0
    • None
    • 3.10.0-514.16.1.el7_lustre.x86_64, lustre-2.9.0_srcc7-1.el7.centos.x86_64
    • 3
    • 9223372036854775807

    Description

      Hi,

      After about 2 months of production, our new Lustre 2.9-based system (Oak) started to show the following Lustre errors in the logs on our OSS servers, just a few days ago:

      Jun 22 10:45:06 oak-io1-s1 kernel: LustreError: 26807:0:(ofd_grant.c:641:ofd_grant_check()) oak-OST0014: cli 1b1c6319-5ec1-dbcc-f6e7-f85575c95c4c claims 131072 GRANT, real grant 0
      
      

      Attaching an lquota subsystem debug trace on one of the OSS and a splunk graph showing that these messages are actually new. I first thought that this could be related to LU-9671 but it might not be after all.

      Thanks,
      Stephane

      Attachments

        Issue Links

          Activity

            [LU-9704] ofd_grant_check() claims GRANT, real grant 0
            pjones Peter Jones made changes -
            Link New: This issue is related to NEC-86 [ NEC-86 ]
            pjones Peter Jones made changes -
            Resolution New: Fixed [ 1 ]
            Status Original: Open [ 1 ] New: Resolved [ 5 ]
            pjones Peter Jones added a comment -

            Landed for 2.15

            pjones Peter Jones added a comment - Landed for 2.15

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45371/
            Subject: LU-9704 grant: ignore grant info on read resend
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 38c78ac2e390b30106f3e185d8c4d92b8cb19c2b

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45371/ Subject: LU-9704 grant: ignore grant info on read resend Project: fs/lustre-release Branch: master Current Patch Set: Commit: 38c78ac2e390b30106f3e185d8c4d92b8cb19c2b
            pjones Peter Jones made changes -
            Fix Version/s New: Lustre 2.15.0 [ 14791 ]
            Fix Version/s New: Lustre 2.12.8 [ 15093 ]

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45474/
            Subject: LU-9704 grant: ignore grant info on read resend
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: e1d132acf58c3e6a90a527a0a09cdd0fff7fc392

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/45474/ Subject: LU-9704 grant: ignore grant info on read resend Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: e1d132acf58c3e6a90a527a0a09cdd0fff7fc392

            "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45474
            Subject: LU-9704 grant: ignore grant info on read resend
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 4162f0d61b762f70e5eb6d099291fad9f836de9d

            gerrit Gerrit Updater added a comment - "Mike Pershin <mpershin@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/45474 Subject: LU-9704 grant: ignore grant info on read resend Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 4162f0d61b762f70e5eb6d099291fad9f836de9d
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-14124 [ LU-14124 ]
            adilger Andreas Dilger made changes -
            Link New: This issue is related to LU-14125 [ LU-14125 ]

            The following scenario makes a message like "claims 28672 GRANT, real
            grant 0" to appear:

            1. client owns X grants and run rpcs to shrink part of those
            2. server fails over so that the shrink rpc is to be resent.
            3. on the clinet reconnect server and client sync on initial amount
            of grants for the client.
            4. shrink rpc is resend, if server disk space is enough, shrink does
            not happen and the client adds amount of grants it was going to
            shrink to its newly initial amount of grants. Now, client thinks that
            it owns more grants than it does from server points of view.
            5. the client consumes grants and sends rpcs to server. Server avoids
            allocating new grants for the client if the current amount of grant
            is big enough:
            static long tgt_grant_alloc(struct obd_export *exp, u64 curgrant,
            ...
            if (curgrant >= want || curgrant >= ted->ted_grant + chunk)
            RETURN(0);
            6. client continues grants consuming which eventually leads to
            complains like "claims 28672 GRANT, real grant 0".

            vsaveliev Vladimir Saveliev added a comment - The following scenario makes a message like "claims 28672 GRANT, real grant 0" to appear: 1. client owns X grants and run rpcs to shrink part of those 2. server fails over so that the shrink rpc is to be resent. 3. on the clinet reconnect server and client sync on initial amount of grants for the client. 4. shrink rpc is resend, if server disk space is enough, shrink does not happen and the client adds amount of grants it was going to shrink to its newly initial amount of grants. Now, client thinks that it owns more grants than it does from server points of view. 5. the client consumes grants and sends rpcs to server. Server avoids allocating new grants for the client if the current amount of grant is big enough: static long tgt_grant_alloc(struct obd_export *exp, u64 curgrant, ... if (curgrant >= want || curgrant >= ted->ted_grant + chunk) RETURN(0); 6. client continues grants consuming which eventually leads to complains like "claims 28672 GRANT, real grant 0".

            People

              tappro Mikhail Pershin
              sthiell Stephane Thiell
              Votes:
              3 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: