Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14299

sanity-quota test 61 fails with 'write failed, expect succeed'

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • Lustre 2.14.0
    • None
    • ZFS
    • 3
    • 9223372036854775807

    Description

      sanity-quota test_61 fails with 'write failed, expect succeed'. The first time this test failed with this error message/case was on 05 May 2020 at https://testing.whamcloud.com/test_sets/f918d217-9487-479e-8294-16936426fe25 for Lustre 2.13.53.162. The failures are mostly seen for ZFS; 200 ZFS failures out of 218 total failures for this test with this error. So far, we are only seeing this failure on master( future 2.14.0).

      Looking at a recent failure with no other sanity-quota test failures, at https://testing.whamcloud.com/test_sets/8ced1c69-2747-41c9-a580-8a3c5fcdf857, we see that sanity-quota test 61 fails with ‘Disk quota exceeded’ after increasing the default quota:

      set to use default quota
      set default quota
      get default quota
      Disk default usr quota:
           Filesystem   bquota  blimit  bgrace   iquota  ilimit  igrace
          /mnt/lustre  20480   20480       0      0       0      10
      Test not out of quota
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=10] [oflag=sync]
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB, 10 MiB) copied, 1.71261 s, 6.1 MB/s
      Test out of quota
      CMD: trevis-201vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1
      CMD: trevis-201vm3 lctl set_param -n osd*.*OS*.force_sync=1
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync]
      dd: error writing '/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0': Disk quota exceeded
      20+0 records in
      19+0 records out
      19922944 bytes (20 MB, 19 MiB) copied, 3.91893 s, 5.1 MB/s
      Increase default quota
      CMD: trevis-201vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1
      CMD: trevis-201vm3 lctl set_param -n osd*.*OS*.force_sync=1
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync]
      dd: error writing '/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0': Disk quota exceeded
      1+0 records in
      0+0 records out
      0 bytes copied, 0.00246058 s, 0.0 kB/s
      CMD: trevis-201vm4 /usr/sbin/lctl get_param -n version 2>/dev/null
      CMD: trevis-201vm4 zpool get all
       sanity-quota test_61: @@@@@@ FAIL: write failed, expect succeed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/sanity-quota.sh:159:quota_error()
        = /usr/lib64/lustre/tests/sanity-quota.sh:3996:test_default_quota()
        = /usr/lib64/lustre/tests/sanity-quota.sh:4051:test_61()
      

      Looking at the test, we increase the default quota, cancel OSC and MDC locks and sync data. We then run dd again and get the quota exceeded error

      3986         log "Increase default quota"
      3987         # increase default quota
      3988         $LFS setquota $qdtype $qs $((LIMIT*3)) $qh $((LIMIT*3)) $DIR ||
      3989                 error "set default quota failed"
      3990 
      3991         cancel_lru_locks osc
      3992         cancel_lru_locks mdc
      3993         sync; sync_all_data || true
      3994         if [ $qpool == "data" ]; then
      3995                 $RUNAS $DD of=$TESTFILE count=$((LIMIT*2 >> 10)) oflag=sync ||
      3996                         quota_error $qtype $qid "write failed, expect succeed"
      3997         else
      3998                 $RUNAS createmany -m $TESTFILE $((LIMIT*2)) ||
      3999                         quota_error $qtype $qid "create failed, expect succeed"
      4000 
      4001                 unlinkmany $TESTFILE $((LIMIT*2))
      4002         fi
      

      Logs for other failures are at
      https://testing.whamcloud.com/test_sets/470f88c5-e555-4352-bd2f-ddb2f281e7b6
      https://testing.whamcloud.com/test_sets/a973395d-3213-4b34-ae57-45155f98ee26

      Attachments

        Issue Links

          Activity

            [LU-14299] sanity-quota test 61 fails with 'write failed, expect succeed'

            "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47618/
            Subject: LU-14299 test: sleep to enable quota acquire again
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set:
            Commit: cadc77a9ba5ca428786c6790cc4e3a496efd5488

            gerrit Gerrit Updater added a comment - "Oleg Drokin <green@whamcloud.com>" merged in patch https://review.whamcloud.com/c/fs/lustre-release/+/47618/ Subject: LU-14299 test: sleep to enable quota acquire again Project: fs/lustre-release Branch: b2_12 Current Patch Set: Commit: cadc77a9ba5ca428786c6790cc4e3a496efd5488

            "Minh Diep <mdiep@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47618
            Subject: LU-14299 test: sleep to enable quota acquire again
            Project: fs/lustre-release
            Branch: b2_12
            Current Patch Set: 1
            Commit: 4070cf9f90d1b37268fd6bc5ac9a2c419cd63e56

            gerrit Gerrit Updater added a comment - "Minh Diep <mdiep@whamcloud.com>" uploaded a new patch: https://review.whamcloud.com/47618 Subject: LU-14299 test: sleep to enable quota acquire again Project: fs/lustre-release Branch: b2_12 Current Patch Set: 1 Commit: 4070cf9f90d1b37268fd6bc5ac9a2c419cd63e56
            pjones Peter Jones added a comment -

            Landed for 2.14

            pjones Peter Jones added a comment - Landed for 2.14

            Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41389/
            Subject: LU-14299 test: sleep to enable quota acquire again
            Project: fs/lustre-release
            Branch: master
            Current Patch Set:
            Commit: 430e3f01ef2dc83ed317cf2b97be8a2ad50d9f13

            gerrit Gerrit Updater added a comment - Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/41389/ Subject: LU-14299 test: sleep to enable quota acquire again Project: fs/lustre-release Branch: master Current Patch Set: Commit: 430e3f01ef2dc83ed317cf2b97be8a2ad50d9f13

            Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41389
            Subject: LU-14299 test: sleep to enable quota acquire again
            Project: fs/lustre-release
            Branch: master
            Current Patch Set: 1
            Commit: d233ba95e6b53dd94ac2ef8930c7f5e7037ca71b

            gerrit Gerrit Updater added a comment - Hongchao Zhang (hongchao@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/41389 Subject: LU-14299 test: sleep to enable quota acquire again Project: fs/lustre-release Branch: master Current Patch Set: 1 Commit: d233ba95e6b53dd94ac2ef8930c7f5e7037ca71b
            pjones Peter Jones added a comment -

            sergey how is your investigation progressing?

            pjones Peter Jones added a comment - sergey how is your investigation progressing?

            Sergey -
            These quota failures started around the time of the OST pool quotas patch landed. Would you please review this failure and could this failure be due to that patch? Does this test need to change based on the OST pool quota patch?

            Thanks

            jamesanunez James Nunez (Inactive) added a comment - Sergey - These quota failures started around the time of the OST pool quotas patch landed. Would you please review this failure and could this failure be due to that patch? Does this test need to change based on the OST pool quota patch? Thanks

            People

              hongchao.zhang Hongchao Zhang
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: