Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-17567

Improve sanity.sh:test_27T

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • None
    • None
    • 3
    • 9223372036854775807

    Description

      multiop called by sanity.sh:test_27T() generates two write rpcs. 
      One of those rpcs takes reserved (granted) space, another is sync write rpc. Server fails for the one which arrived first, that may be either of those 2 rpcs.

      If second rpc (sync one) fails - all works as expected: multiop completes with short write.

      When first of those (granted rpc) arrives first - things break:
      1. granted rpc are not supposed to fail with ENOSPC (as grants guarantee that server has enough space).
      2. that breaks grant accounting:
      server skips grant accounting in tgt_brw_write as it returns ENOSPC at the beginning:

      tgt_brw_write()
        if (OBD_FAIL_CHECK(OBD_FAIL_OST_ENOSPC))
          RETURN(err_serious(-ENOSPC));
        obd_preprw
          ofd_preprw_write
            tgt_grant_prepare_write
              tgt_grant_incoming
      

      The client however, does that despite that ENOSPC received:

      brw_interpret
        osc_extent_finish
          osc_free_grant
            cli->cl_dirty_grant -= dirty_grant;
      

      It can be seen if test_27T is listed in GRANT_CHECK_LIST:

      == sanity test 27T: no eio on close on partial write due to enosp ========================================================== 16:56:45 (1708351005)
      fail_loc=0x20000411
      fail_val=1
      fail_loc=0x80000215
      checking grant......UUID                   1K-blocks        Used   Available Use% Mounted on
      lustre-MDT0000_UUID       125056        1684      112136   2% /mnt/lustre[MDT:0]
      lustre-MDT0001_UUID       125056        1548      112272   2% /mnt/lustre[MDT:1]
      lustre-OST0000_UUID       313104        1540      280220   1% /mnt/lustre[OST:0]
      lustre-OST0001_UUID       313104        1540      284404   1% /mnt/lustre[OST:1]
      
      filesystem_summary:       626208        3080      564624   1% /mnt/lustre
      
      wait for client:21094400 == server:25313280
      wait for client:21094400 == server:25313280
      ...
       sanity test_27T: @@@@@@ FAIL: failed grant check: client:21094400 server:16875520
      

      Attachments

        Activity

          People

            vsaveliev Vladimir Saveliev
            vsaveliev Vladimir Saveliev
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: