Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-20435

OSD-ZFS failing to skip hole during lfs mirror resync

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Medium Medium
    • None
    • None
    • 3
    • 9223372036854775807

      This issue was created by maloo for Marc Vef <mvef@whamcloud.com>

      LU-14217 made an effort improve SEEK_HOLE/DATA support on OSD-ZFS. However, it seems that this resolution was not enough as a similar issue resurfaced in EC tests (see below for sanity-ec/12b) where lfs mirror resync should skip the hole at the end of the file. While this works for ldiskfs, it does not for ZFS and it runs out of space as a result since the test file is too large when not skipping the hole. Initial investigation shows that dmu_offset_next() (the ZFS function under osd_lseek) is failing to return the next data offset at ~3.81 TiB. The osd-zfs code anticipates this case and tries txg_wait_synced() + retry on EBUSY (see osd-zfs/ods_io.c:osd_lseek()) (which is part of the fix in LU-14217).

      ---------

      This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/efad0dd3-eac3-4db1-ba9b-ce34c63e376d

      test_12b failed with the following error:

      failed to resync ec mirror
      

      Test session details:
      clients: https://build.whamcloud.com/job/lustre-reviews/124914 - 4.18.0-553.117.1.el8_10.x86_64
      servers: https://build.whamcloud.com/job/lustre-reviews/124914 - 4.18.0-553.117.1.el8_lustre.x86_64

      Compute/verify raidset range=23026728960-23039311872
      Compute/verify raidset range=23039311872-23051894784
      Compute/verify raidset range=23051894784-23064477696
      lfs mirror mirror: fail to pwrite 7683964928-4194304 of mirror 2: No space left on device (28)
      lfs mirror mirror: could not write ec parities: No space left on device (28)
      lfs mirror mirror: failed to sync ec comp: No space left on device (28)
      lfs mirror mirror: fail to ec resync '/mnt/lustre/d12b.sanity-ec/f12b.sanity-ec': No space left on device (28)
       sanity-ec test_12b: @@@@@@ FAIL: failed to resync ec mirror 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:7409:error()
        = /usr/lib64/lustre/tests/sanity-ec.sh:1915:test_12b()
        = /usr/lib64/lustre/tests/test-framework.sh:7785:run_one()
        = /usr/lib64/lustre/tests/test-framework.sh:7848:run_one_logged()
        = /usr/lib64/lustre/tests/test-framework.sh:7648:run_test()
        = /usr/lib64/lustre/tests/sanity-ec.sh:1922:main() 

            wc-triage WC Triage
            maloo Maloo
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated: