Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-12120

LustreError: 15069:0:(tgt_grant.c:561:tgt_grant_incoming()) LBUG

Details

    • Bug
    • Resolution: Fixed
    • Major
    • Lustre 2.13.0, Lustre 2.12.3
    • Lustre 2.10.5
    • None
    • 3
    • 9223372036854775807

    Description

      First time we have see this lbug. We have a crash dump.

      [11168227.609396] Lustre: Skipped 4149 previous similar messages
      [11168546.559600] LustreError: 15069:0:(tgt_grant.c:559:tgt_grant_incoming()) nbp2-OST0121: cli 811dc111-6b22-b447-9144-bcd1173a571d/ffff881d57bff000 dirty -610055168 pend 0 grant -643806208
      [11168546.609562] LustreError: 15069:0:(tgt_grant.c:561:tgt_grant_incoming()) LBUG
      [11168546.631425] Pid: 15069, comm: ll_ost_io01_084 3.10.0-693.21.1.el7.20180508.x86_64.lustre2105 #1 SMP Mon Aug 27 23:04:41 UTC 2018
      [11168546.631428] Call Trace:
      [11168546.631439]  [<ffffffff8103a1f2>] save_stack_trace_tsk+0x22/0x40
      [11168546.650223]  [<ffffffffa08597cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
      [11168546.650228]  [<ffffffffa085987c>] lbug_with_loc+0x4c/0xa0 [libcfs]
      [11168546.650292]  [<ffffffffa0e8c270>] tgt_grant_prepare_read+0x0/0x3b0 [ptlrpc]
      [11168546.650325]  [<ffffffffa0e8c37b>] tgt_grant_prepare_read+0x10b/0x3b0 [ptlrpc]
      [11168546.650335]  [<ffffffffa0c86590>] ofd_preprw+0x450/0x1170 [ofd]
      [11168546.650365]  [<ffffffffa0e702a5>] tgt_brw_read+0x975/0x1860 [ptlrpc]
      [11168546.650394]  [<ffffffffa0e6deca>] tgt_request_handle+0x92a/0x1370 [ptlrpc]
      [11168546.650420]  [<ffffffffa0e164bb>] ptlrpc_server_handle_request+0x23b/0xaa0 [ptlrpc]
      [11168546.650445]  [<ffffffffa0e1a4a2>] ptlrpc_main+0xa92/0x1e40 [ptlrpc]
      [11168546.650449]  [<ffffffff810b1131>] kthread+0xd1/0xe0
      [11168546.650452]  [<ffffffff816a14f7>] ret_from_fork+0x77/0xb0
      [11168546.650468]  [<ffffffffffffffff>] 0xffffffffffffffff
      [11168546.650469] Kernel panic - not syncing: LBUG 
       

      Attachments

        Issue Links

          Activity

            [LU-12120] LustreError: 15069:0:(tgt_grant.c:561:tgt_grant_incoming()) LBUG

            The patch was applied cleanly to nas-2.10.8. No need of a backport.

            jaylan Jay Lan (Inactive) added a comment - The patch was applied cleanly to nas-2.10.8. No need of a backport.

            We used the patch from https://review.whamcloud.com/35084/ which applied against 2.10.8 with no change (or it was trivial).

            sthiell Stephane Thiell added a comment - We used the patch from  https://review.whamcloud.com/35084/  which applied against 2.10.8 with no change (or it was trivial).

            We need a 2.10 backport. Our 2.10.7 server this lbug

             7955125.139926] LustreError: 23678:0:(tgt_grant.c:1068:tgt_grant_discard()) ASSERTION( tgd->tgd_tot_granted >= ted->ted_grant ) failed: nbp7-OST0017: tot_granted 7448219072 cli 4817e87d-d95a-1480-0cbd-835e2c3966b1/ffff880f68fdc400 ted_grant -49152
            [7955125.164200] LustreError: 23678:0:(tgt_grant.c:1068:tgt_grant_discard()) LBUG
            
            mhanafi Mahmoud Hanafi added a comment - We need a 2.10 backport. Our 2.10.7 server this lbug 7955125.139926] LustreError: 23678:0:(tgt_grant.c:1068:tgt_grant_discard()) ASSERTION( tgd->tgd_tot_granted >= ted->ted_grant ) failed: nbp7-OST0017: tot_granted 7448219072 cli 4817e87d-d95a-1480-0cbd-835e2c3966b1/ffff880f68fdc400 ted_grant -49152 [7955125.164200] LustreError: 23678:0:(tgt_grant.c:1068:tgt_grant_discard()) LBUG

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35084/
            Subject: LU-12120 grants: prevent negative ted_grant value
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: 83d43ecc947fa8f43bd5c7f792f3788cadd0d3ef

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35084/ Subject: LU-12120 grants: prevent negative ted_grant value Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: 83d43ecc947fa8f43bd5c7f792f3788cadd0d3ef

            Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35084
            Subject: LU-12120 grants: prevent negative ted_grant value
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 82ad1cf4fd9298f95a8f233ee18feef363b10ebb

            gerrit Gerrit Updater added a comment - Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35084 Subject: LU-12120 grants: prevent negative ted_grant value Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 82ad1cf4fd9298f95a8f233ee18feef363b10ebb
            pjones Peter Jones added a comment -

            Landed for 2.13

            pjones Peter Jones added a comment - Landed for 2.13

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34996/
            Subject: LU-12120 grants: prevent negative ted_grant value
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 7e08317ef5cbed5cd587017cbe343eb4cc52822c

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/34996/ Subject: LU-12120 grants: prevent negative ted_grant value Project: fs/lustre-release Branch: master Current Patch Set: Commit: 7e08317ef5cbed5cd587017cbe343eb4cc52822c

            Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34996
            Subject: LU-12120 grants: prevent negative ted_grant value
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: 296ac6b910880dc65c9cee8be2552bba759cea57

            gerrit Gerrit Updater added a comment - Mike Pershin (mpershin@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/34996 Subject: LU-12120 grants: prevent negative ted_grant value Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: 296ac6b910880dc65c9cee8be2552bba759cea57

            Ok, thanks Mikhail!

            From what I understand clients and servers maintain separate views of grants and this LU relates to pairs of (client; server) that drift apart from each other. At the next shrink request, that results in the client asking to give back more space than the server remembers granting.

            Does anyone know if there a valid reason as to why in tgt_grant_incoming():

            static void tgt_grant_incoming(...)
            {
                    ...
                    CDEBUG(D_CACHE, "%s: cli %s/%p reports grant %llu dropped %u, local %lu\n",
                           obd->obd_name, exp->exp_client_uuid.uuid, exp, oa->o_grant,
                           oa->o_dropped, ted->ted_grant);
                    ...
            }
            

            The value printed after "reports grant" would not be the same as the value printed after "local"?
            Can this be used to track the desynchronization?

            bougetq Quentin Bouget (Inactive) added a comment - Ok, thanks Mikhail! From what I understand clients and servers maintain separate views of grants and this LU relates to pairs of (client; server) that drift apart from each other. At the next shrink request, that results in the client asking to give back more space than the server remembers granting. — Does anyone know if there a valid reason as to why in tgt_grant_incoming() : static void tgt_grant_incoming(...) { ... CDEBUG(D_CACHE, "%s: cli %s/%p reports grant %llu dropped %u, local %lu\n" , obd->obd_name, exp->exp_client_uuid.uuid, exp, oa->o_grant, oa->o_dropped, ted->ted_grant); ... } The value printed after "reports grant" would not be the same as the value printed after "local"? Can this be used to track the desynchronization?
            tappro Mikhail Pershin added a comment - - edited

            Quentin, I agree with your analysis and also think that tgt_grant_shrink should not decrease ted_grant blindly but do sanity checks there. I will prepare patch today

            tappro Mikhail Pershin added a comment - - edited Quentin, I agree with your analysis and also think that tgt_grant_shrink should not decrease ted_grant blindly but do sanity checks there. I will prepare patch today

            People

              pfarrell Patrick Farrell (Inactive)
              mhanafi Mahmoud Hanafi
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: