Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14125

client starved for grant but OST has plenty of free space

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • Lustre 2.14.0
    • None
    • zfs-0.7
      lustre-2.12.5_5.llnl-1.ch6.x86_64
    • 3
    • 9223372036854775807

    Description

      Something is causing cur_grant_bytes for some OSCs to go below 1MB. Which OSCs and which nodes are apparently random. The OSTs themselves have many TB of free space. Sequential writes (e.g. dd if=/dev/urandom of=file_on_ost_3 bs=1M count=40 where the file has just one stripe) produces osc_enter_cache() debug entries reporting it is falling back to sync I/O. We also see osc_update_grant() report it got 0 extra grant.

      We have not been able to identify a workload or other trigger that pushes cur_grant_bytes low in the first place. We also have not been able to find a workaround that results in the OST returning extra grant.

      We set grant_shrink=0 on all clients (using set_param -P on the mgs) and then stopped and started all the OSTs on the file system. This did not change the symptoms in any obvious way.

      Client snippet with debug="+cache":

      00000008:00000020:55.0:1604686702.850333:0:15766:0:(osc_cache.c:1613:osc_enter_cache()) lsrza-OST0003-osc-ffff8b2f37fc1000: grant { dirty: 0/512000 dirty_pages: 0/16450184 dropped: 0 avail: 997461, dirty_grant: 0, reserved: 0, flight: 0 } lru {in list: 9984, left: 256, waiters: 0 }need:1703936
      
      00000008:00000020:55.0:1604686702.850335:0:15766:0:(osc_cache.c:1543:osc_enter_cache_try()) lsrza-OST0003-osc-ffff8b2f37fc1000: grant { dirty: 0/512000 dirty_pages: 0/16450184 dropped: 0 avail: 997461, dirty_grant: 0, reserved: 0, flight: 0 } lru {in list: 9984, left: 256, waiters: 0 }need:1703936
      
      00000008:00000020:55.0:1604686702.850337:0:15766:0:(osc_cache.c:1658:osc_enter_cache()) lsrza-OST0003-osc-ffff8b2f37fc1000: grant { dirty: 0/512000 dirty_pages: 0/16450184 dropped: 0 avail: 997461, dirty_grant: 0, reserved: 0, flight: 0 } lru {in list: 9984, left: 256, waiters: 0 }no grant space, fall back to sync i/o
      
      00000008:00400020:55.0:1604686702.850352:0:15766:0:(osc_io.c:127:osc_io_submit()) 256 1
      00000008:00000020:55.0:1604686702.850385:0:15766:0:(osc_cache.c:1743:osc_update_pending()) obj ffff8b2c6bc58640 ready 0|-|- wr 256|+|- rd 0|- update pending cmd 2 delta 256.
      00000008:00000020:55.0:1604686702.850387:0:15766:0:(osc_cache.c:2297:osc_io_unplug0()) Queue writeback work for client ffff8b1efb0d25e0.
      00000008:00000020:19.0:1604686702.850400:0:20698:0:(osc_request.c:3171:brw_queue_work()) Run writeback work for client obd ffff8b1efb0d25e0.
      00000008:00000020:19.0:1604686702.850402:0:20698:0:(osc_cache.c:2222:osc_check_rpcs()) obj ffff8b2c6bc58640 ready 0|-|- wr 256|+|- rd 0|- 0 in flight
      00000008:00000020:19.0:1604686702.850404:0:20698:0:(osc_cache.c:1697:osc_makes_rpc()) high prio request forcing RPC
      00000008:00000020:19.0:1604686702.850405:0:20698:0:(osc_cache.c:1888:try_to_add_extent_for_io()) extent ffff8b2c6b9c7c30@{[9984 -> 10239/10239], [1|0|+|lockdone|wShu|ffff8b2c6bc58640], [0|256|+|-|          (null)|256|          (null)]} trying to add this extent
      00000008:00000020:19.0:1604686702.850408:0:20698:0:(osc_cache.c:1743:osc_update_pending()) obj ffff8b2c6bc58640 ready 0|-|- wr 0|-|- rd 0|- update pending cmd 2 delta -256.
      00000008:00000020:19.0:1604686702.850441:0:20698:0:(osc_request.c:705:osc_announce_cached()) dirty: 0 undirty: 1879048191 dropped 0 grant: 997461
      00000008:00000020:19.0:1604686702.850443:0:20698:0:(osc_request.c:714:osc_update_next_shrink()) next time 6200398 to shrink grant
      00000008:00000020:60.0:1604686703.244890:0:20699:0:(osc_request.c:727:osc_update_grant()) got 0 extra grant
      

      Server snippet:

      00000020:00000020:1.0:1604686702.851666:0:15401:0:(tgt_grant.c:413:tgt_grant_statfs()) lsrza-OST0003: cli 726797b8-322a-1989-0cb5-3645daf9a6ce/ffff8fe6bb09d800 free: 263316194721792 avail: 263316186333184
      00000020:00000020:1.0:1604686702.851668:0:15401:0:(tgt_grant.c:477:tgt_grant_space_left()) lsrza-OST0003: cli 726797b8-322a-1989-0cb5-3645daf9a6ce/ffff8fe6bb09d800 avail 263316186333184 left 262158115930112 unstable 3407872 tot_grant 1158069646026 pending 3407872
      00000020:00000020:1.0:1604686702.851670:0:15401:0:(tgt_grant.c:519:tgt_grant_incoming()) lsrza-OST0003: cli 726797b8-322a-1989-0cb5-3645daf9a6ce/ffff8fe6bb09d800 reports grant 997461 dropped 0, local 1882456063
      00000020:00000020:1.0:1604686702.851672:0:15401:0:(tgt_grant.c:848:tgt_grant_check()) lsrza-OST0003: cli 726797b8-322a-1989-0cb5-3645daf9a6ce/ffff8fe6bb09d800 granted: 0 ungranted: 1703936 grant: 1882456063 dirty: 0
      

      Each pool contains only one OST, and the storage is used for nothing else. All the OSTs have about the same amount of free space. This is the pool containing OST0003.

      [root@brass8:toss-4917-grant]# zpool list
      NAME     SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
      brass8   580T   243T   337T         -    25%    41%  1.00x  ONLINE  -
      

      Attachments

        Issue Links

          Activity

            People

              tappro Mikhail Pershin
              ofaaland Olaf Faaland
              Votes:
              0 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated: