[LU-11288] tgt_grant_sanity_check()) LBUG Created: 28/Aug/18  Updated: 19/Apr/21  Resolved: 29/Oct/18

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.12.0
Fix Version/s: Lustre 2.12.0

Type: Bug Priority: Major
Reporter: Oleg Drokin Assignee: Alex Zhuravlev
Resolution: Fixed Votes: 0
Labels: None

Attachments: PNG File 微信图片_20210419095801.png    
Issue Links:
Blocker
Duplicate
Related
is related to LU-8708 Grant shrinking disabled all the time Resolved
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

Had this trigger in racer in current master-next, but does not appear to be caused by anything unlanded.

[ 2800.273242] LustreError: 20863:0:(tgt_grant.c:151:tgt_check_export_grants()) lustre-OST0002: cli e57bac33-ee31-2bdc-225e-2658736d80ff/ffff880251df1800 ted_grant(1142554624) + ted_pending(0) > maxsize(250609664)
[ 2800.312893] LustreError: 20863:0:(tgt_grant.c:223:tgt_grant_sanity_check()) LBUG
[ 2800.319055] Pid: 20863, comm: ll_ost_create07 3.10.0-7.5-debug #1 SMP Sun Jun 3 13:35:38 EDT 2018
[ 2800.321344] Call Trace:
[ 2800.322479]  [<ffffffffa01cd7dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 2800.327468]  [<ffffffffa01cd88c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 2800.328911]  [<ffffffffa0bbbf7c>] tgt_grant_sanity_check+0x51c/0x550 [ptlrpc]
[ 2800.333566]  [<ffffffffa127dc14>] ofd_statfs+0x104/0x480 [ofd]
[ 2800.334982]  [<ffffffffa12712e0>] ofd_statfs_hdl+0x70/0x280 [ofd]
[ 2800.335938] LustreError: 3246:0:(tgt_grant.c:151:tgt_check_export_grants()) lustre-OST0002: cli e57bac33-ee31-2bdc-225e-2658736d80ff/ffff880251df1800 ted_grant(1142554624) + ted_pending(0) > maxsize(250609664)
[ 2800.335961] LustreError: 3246:0:(tgt_grant.c:223:tgt_grant_sanity_check()) LBUG
[ 2800.344690]  [<ffffffffa0ba0705>] tgt_request_handle+0xaf5/0x1590 [ptlrpc]
[ 2800.348268]  [<ffffffffa0b44e26>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc]
[ 2800.350975]  [<ffffffffa0b48c1e>] ptlrpc_main+0xabe/0x1f80 [ptlrpc]
[ 2800.352543]  [<ffffffff810ae864>] kthread+0xe4/0xf0
[ 2800.354434]  [<ffffffff81783777>] ret_from_fork_nospec_end+0x0/0x39
[ 2800.356861]  [<ffffffffffffffff>] 0xffffffffffffffff
[ 2800.360929] Kernel panic - not syncing: LBUG
[ 2800.360945] Pid: 3246, comm: ll_ost_create00 3.10.0-7.5-debug #1 SMP Sun Jun 3 13:35:38 EDT 2018
[ 2800.360945] Call Trace:
[ 2800.360972]  [<ffffffffa01cd7dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
[ 2800.360977]  [<ffffffffa01cd88c>] lbug_with_loc+0x4c/0xa0 [libcfs]
[ 2800.361073]  [<ffffffffa0bbbf7c>] tgt_grant_sanity_check+0x51c/0x550 [ptlrpc]
[ 2800.361090]  [<ffffffffa127dc14>] ofd_statfs+0x104/0x480 [ofd]
[ 2800.361093]  [<ffffffffa12712e0>] ofd_statfs_hdl+0x70/0x280 [ofd]
[ 2800.361125]  [<ffffffffa0ba0705>] tgt_request_handle+0xaf5/0x1590 [ptlrpc]
[ 2800.361152]  [<ffffffffa0b44e26>] ptlrpc_server_handle_request+0x256/0xad0 [ptlrpc]
[ 2800.361190]  [<ffffffffa0b48c1e>] ptlrpc_main+0xabe/0x1f80 [ptlrpc]
[ 2800.361196]  [<ffffffff810ae864>] kthread+0xe4/0xf0
[ 2800.361200]  [<ffffffff81783777>] ret_from_fork_nospec_end+0x0/0x39
[ 2800.361203]  [<ffffffffffffffff>] 0xffffffffffffffff


 Comments   
Comment by Oleg Drokin [ 28/Aug/18 ]

First recorded failure of this is on July 27th.

Comment by Peter Jones [ 30/Aug/18 ]

Bobijam

Is this related to your first LU-8708 patch? The timing seems to line up...

Peter

Comment by Zhenyu Xu [ 31/Aug/18 ]

LU-8708 enables grant shrink by default, wondering whether there is issues with the grant shrinking algorithm that causes this.

Comment by Oleg Drokin [ 31/Aug/18 ]

ok, thanks. I'll try to revert that patch locally and see if it makes any difference.

Comment by Peter Jones [ 04/Sep/18 ]

Oleg has confirmed that reverting the patch to enable grant shrink stops this failure appearing in his testing. However, based on Bobijam's analysis that means that the bug is still there, just not being exposed.

Comment by Peter Jones [ 15/Sep/18 ]

Bobijam

Have you been able to make any progress on identifying the problem with the grant shrink algorithm?

Peter

Comment by Zhenyu Xu [ 17/Sep/18 ]

not yet, this code path hasn't been used for a long time, and I'm not familiar with grant as well.

Comment by Peter Jones [ 17/Sep/18 ]

Ok then let's have Alex handle this - thanks for your analysis so far!

Comment by Alex Zhuravlev [ 18/Sep/18 ]

any details on how to reproduce that?

 

Comment by Gerrit Updater [ 24/Sep/18 ]

Alex Zhuravlev (bzzz@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/33226
Subject: LU-11288 osc: check target versus available grant
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: ec68ff8b3d389c86da6cf74fad49a513e4cfd255

Comment by Gerrit Updater [ 29/Oct/18 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/33226/
Subject: LU-11288 osc: re-check target versus available grant
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: fcbd8c981239dc2be4bcf55e9a3e40b72d939700

Comment by Peter Jones [ 29/Oct/18 ]

Landed for 2.12

Generated at Sat Feb 10 02:42:34 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.