Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • Lustre 2.6.0
    • None
    • 3
    • 12273

    Description

      Lustre: DEBUG MARKER: == sanity test 63a: Verify oig_wait interruption does not crash ========= 13:06:48 (1389690408)
      LustreError: 14188:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 0 < tot_grant 34105472 unstable 0 pending 0
      LustreError: 18856:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 0 < tot_grant 27814016 unstable 8388608 pending 8388608
      LustreError: 18856:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 26 previous similar messages
      LustreError: 14178:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 0 < tot_grant 36505728 unstable 0 pending 0
      LustreError: 14178:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 33 previous similar messages
      LustreError: 25596:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 3014656 < tot_grant 37619840 unstable 0 pending 0
      LustreError: 25596:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 57 previous similar messages
      LustreError: 14192:0:(ofd_grant.c:255:ofd_grant_space_left()) lustre-OST0010: cli 17a94b5b-8940-b62d-589f-082653aa3e82/ffff88003789e6c8 left 3063808 < tot_grant 43010176 unstable 0 pending 0
      LustreError: 14192:0:(ofd_grant.c:255:ofd_grant_space_left()) Skipped 65 previous similar messages

      Attachments

        Issue Links

          Activity

            [LU-4482] OST grants bugs

            patch landed on master for 2.6

            niu Niu Yawei (Inactive) added a comment - patch landed on master for 2.6
            niu Niu Yawei (Inactive) added a comment - http://review.whamcloud.com/8911

            Well, there are two problems:

            • client should consume grant for sync write (as I mentioned above);
            • osd_statfs() shouldn't cache statfs data because grant mechanism relies on dt_statfs() returning fresh data.
            niu Niu Yawei (Inactive) added a comment - Well, there are two problems: client should consume grant for sync write (as I mentioned above); osd_statfs() shouldn't cache statfs data because grant mechanism relies on dt_statfs() returning fresh data.

            If sync write doesn't consume grant, the grant hold by client will not decreased on sync write, however, free space on OST will be decreased, at the end, OST space will be used up by sync writes, however, client still hold grants, and further cached data will be lost.

            The error message shows that available space is less than total granted bytes (which means client has grant, but OST hasn't enough space for grant) it's because sync write doesn't consume grant but consumes space.

            niu Niu Yawei (Inactive) added a comment - If sync write doesn't consume grant, the grant hold by client will not decreased on sync write, however, free space on OST will be decreased, at the end, OST space will be used up by sync writes, however, client still hold grants, and further cached data will be lost. The error message shows that available space is less than total granted bytes (which means client has grant, but OST hasn't enough space for grant) it's because sync write doesn't consume grant but consumes space.

            obviously the issue here is not for ENOSPC. the reserved space is less than granted bytes. Did I miss something?

            jay Jinshan Xiong (Inactive) added a comment - obviously the issue here is not for ENOSPC. the reserved space is less than granted bytes. Did I miss something?

            If it can cause grant issue, then grant algorithm has BUGs because pages without FROM_GRANT flag shouldn't consume reserved space.

            Any kind of write (include sync write or direct io) should consume grant if the client has available grant (and the FROM_GRANT flag should be set on these pages), otherwise, OST could run of of space with client still holding lots of grant.

            niu Niu Yawei (Inactive) added a comment - If it can cause grant issue, then grant algorithm has BUGs because pages without FROM_GRANT flag shouldn't consume reserved space. Any kind of write (include sync write or direct io) should consume grant if the client has available grant (and the FROM_GRANT flag should be set on these pages), otherwise, OST could run of of space with client still holding lots of grant.

            It's changed that way on purpose because I think it doesn't need to consume grant if the application can see the errors with a sync write.

            If it can cause grant issue, then grant algorithm has BUGs because pages without FROM_GRANT flag shouldn't consume reserved space.

            jay Jinshan Xiong (Inactive) added a comment - It's changed that way on purpose because I think it doesn't need to consume grant if the application can see the errors with a sync write. If it can cause grant issue, then grant algorithm has BUGs because pages without FROM_GRANT flag shouldn't consume reserved space.

            Seems it's introduced by "LU-1030 osc: new IO engine implementation", it looks to me that sync write doesn't consume grant anymore from 9fe4b52ad2ffadf125d9b5c78bb2ff9a01725707, I think we'd add it back.

            Xiong, could you take a look at this? I think it's an unintentional change, right?

            niu Niu Yawei (Inactive) added a comment - Seems it's introduced by " LU-1030 osc: new IO engine implementation", it looks to me that sync write doesn't consume grant anymore from 9fe4b52ad2ffadf125d9b5c78bb2ff9a01725707, I think we'd add it back. Xiong, could you take a look at this? I think it's an unintentional change, right?
            pjones Peter Jones added a comment -

            Niu

            Could you please look into this one?

            Thanks

            Peter

            pjones Peter Jones added a comment - Niu Could you please look into this one? Thanks Peter
            green Oleg Drokin added a comment -

            Also it seems to started at around May 25 2013 in my test logs as I now see

            green Oleg Drokin added a comment - Also it seems to started at around May 25 2013 in my test logs as I now see

            People

              niu Niu Yawei (Inactive)
              shadow Alexey Lyashkov
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: