Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-14299

sanity-quota test 61 fails with 'write failed, expect succeed'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Minor
    • Lustre 2.14.0
    • Lustre 2.14.0
    • None
    • ZFS
    • 3
    • 9223372036854775807

    Description

      sanity-quota test_61 fails with 'write failed, expect succeed'. The first time this test failed with this error message/case was on 05 May 2020 at https://testing.whamcloud.com/test_sets/f918d217-9487-479e-8294-16936426fe25 for Lustre 2.13.53.162. The failures are mostly seen for ZFS; 200 ZFS failures out of 218 total failures for this test with this error. So far, we are only seeing this failure on master( future 2.14.0).

      Looking at a recent failure with no other sanity-quota test failures, at https://testing.whamcloud.com/test_sets/8ced1c69-2747-41c9-a580-8a3c5fcdf857, we see that sanity-quota test 61 fails with ‘Disk quota exceeded’ after increasing the default quota:

      set to use default quota
      set default quota
      get default quota
      Disk default usr quota:
           Filesystem   bquota  blimit  bgrace   iquota  ilimit  igrace
          /mnt/lustre  20480   20480       0      0       0      10
      Test not out of quota
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=10] [oflag=sync]
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB, 10 MiB) copied, 1.71261 s, 6.1 MB/s
      Test out of quota
      CMD: trevis-201vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1
      CMD: trevis-201vm3 lctl set_param -n osd*.*OS*.force_sync=1
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync]
      dd: error writing '/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0': Disk quota exceeded
      20+0 records in
      19+0 records out
      19922944 bytes (20 MB, 19 MiB) copied, 3.91893 s, 5.1 MB/s
      Increase default quota
      CMD: trevis-201vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1
      CMD: trevis-201vm3 lctl set_param -n osd*.*OS*.force_sync=1
      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0] [count=40] [oflag=sync]
      dd: error writing '/mnt/lustre/d61.sanity-quota/f61.sanity-quota-0': Disk quota exceeded
      1+0 records in
      0+0 records out
      0 bytes copied, 0.00246058 s, 0.0 kB/s
      CMD: trevis-201vm4 /usr/sbin/lctl get_param -n version 2>/dev/null
      CMD: trevis-201vm4 zpool get all
       sanity-quota test_61: @@@@@@ FAIL: write failed, expect succeed 
        Trace dump:
        = /usr/lib64/lustre/tests/test-framework.sh:6273:error()
        = /usr/lib64/lustre/tests/sanity-quota.sh:159:quota_error()
        = /usr/lib64/lustre/tests/sanity-quota.sh:3996:test_default_quota()
        = /usr/lib64/lustre/tests/sanity-quota.sh:4051:test_61()
      

      Looking at the test, we increase the default quota, cancel OSC and MDC locks and sync data. We then run dd again and get the quota exceeded error

      3986         log "Increase default quota"
      3987         # increase default quota
      3988         $LFS setquota $qdtype $qs $((LIMIT*3)) $qh $((LIMIT*3)) $DIR ||
      3989                 error "set default quota failed"
      3990 
      3991         cancel_lru_locks osc
      3992         cancel_lru_locks mdc
      3993         sync; sync_all_data || true
      3994         if [ $qpool == "data" ]; then
      3995                 $RUNAS $DD of=$TESTFILE count=$((LIMIT*2 >> 10)) oflag=sync ||
      3996                         quota_error $qtype $qid "write failed, expect succeed"
      3997         else
      3998                 $RUNAS createmany -m $TESTFILE $((LIMIT*2)) ||
      3999                         quota_error $qtype $qid "create failed, expect succeed"
      4000 
      4001                 unlinkmany $TESTFILE $((LIMIT*2))
      4002         fi
      

      Logs for other failures are at
      https://testing.whamcloud.com/test_sets/470f88c5-e555-4352-bd2f-ddb2f281e7b6
      https://testing.whamcloud.com/test_sets/a973395d-3213-4b34-ae57-45155f98ee26

      Attachments

        Issue Links

          Activity

            People

              hongchao.zhang Hongchao Zhang
              jamesanunez James Nunez (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: