[LU-4482] OST grants bugs Created: 14/Jan/14  Updated: 25/Feb/14  Resolved: 25/Feb/14

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: Lustre 2.6.0

Type: Bug Priority: Blocker
Reporter: Alexey Lyashkov Assignee: Niu Yawei (Inactive)
Resolution: Fixed Votes: 0
Labels: MB

Issue Links:
Related
is related to LU-4664 sync write should consume grant on cl... Resolved
Severity: 3
Rank (Obsolete): 12273

 Description   

Lustre: DEBUG MARKER: == sanity test 63a: Verify oig_wait interruption does not crash ========= 13:06:48 (1389690408)
LustreError: 14188:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 0 < tot_grant 34105472 unstable 0 pending 0
LustreError: 18856:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 0 < tot_grant 27814016 unstable 8388608 pending 8388608
LustreError: 18856:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 26 previous similar messages
LustreError: 14178:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 0 < tot_grant 36505728 unstable 0 pending 0
LustreError: 14178:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 33 previous similar messages
LustreError: 25596:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 3014656 < tot_grant 37619840 unstable 0 pending 0
LustreError: 25596:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 57 previous similar messages
LustreError: 14192:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 3063808 < tot_grant 43010176 unstable 0 pending 0
LustreError: 14192:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 65 previous similar messages



 Comments   
Comment by Oleg Drokin [ 15/Jan/14 ]

I just checked my logs and I frequently see this in case of OST being full. It's probably been thee for a while since I see this all the way back to when my logs started.

We need to get to the root of this as this potentially can lead to unexpected data loss on the client side.

Comment by Oleg Drokin [ 15/Jan/14 ]

Also it seems to started at around May 25 2013 in my test logs as I now see

Comment by Peter Jones [ 15/Jan/14 ]

Niu

Could you please look into this one?

Thanks

Peter

Comment by Niu Yawei (Inactive) [ 16/Jan/14 ]

Seems it's introduced by "LU-1030 osc: new IO engine implementation", it looks to me that sync write doesn't consume grant anymore from 9fe4b52ad2ffadf125d9b5c78bb2ff9a01725707, I think we'd add it back.

Xiong, could you take a look at this? I think it's an unintentional change, right?

Comment by Jinshan Xiong (Inactive) [ 16/Jan/14 ]

It's changed that way on purpose because I think it doesn't need to consume grant if the application can see the errors with a sync write.

If it can cause grant issue, then grant algorithm has BUGs because pages without FROM_GRANT flag shouldn't consume reserved space.

Comment by Niu Yawei (Inactive) [ 16/Jan/14 ]

If it can cause grant issue, then grant algorithm has BUGs because pages without FROM_GRANT flag shouldn't consume reserved space.

Any kind of write (include sync write or direct io) should consume grant if the client has available grant (and the FROM_GRANT flag should be set on these pages), otherwise, OST could run of of space with client still holding lots of grant.

Comment by Jinshan Xiong (Inactive) [ 16/Jan/14 ]

obviously the issue here is not for ENOSPC. the reserved space is less than granted bytes. Did I miss something?

Comment by Niu Yawei (Inactive) [ 16/Jan/14 ]

If sync write doesn't consume grant, the grant hold by client will not decreased on sync write, however, free space on OST will be decreased, at the end, OST space will be used up by sync writes, however, client still hold grants, and further cached data will be lost.

The error message shows that available space is less than total granted bytes (which means client has grant, but OST hasn't enough space for grant) it's because sync write doesn't consume grant but consumes space.

Comment by Niu Yawei (Inactive) [ 18/Jan/14 ]

Well, there are two problems:

  • client should consume grant for sync write (as I mentioned above);
  • osd_statfs() shouldn't cache statfs data because grant mechanism relies on dt_statfs() returning fresh data.
Comment by Niu Yawei (Inactive) [ 18/Jan/14 ]

http://review.whamcloud.com/8911

Comment by Niu Yawei (Inactive) [ 25/Feb/14 ]

patch landed on master for 2.6

Generated at Sat Feb 10 01:43:07 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.