Uploaded image for project: 'Lustre'
  1. Lustre
  2. LU-9557

sanity-quota test 0 fails with 'SLOW IO for quota_usr (user): XX KB/sec'

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Minor
    • None
    • Lustre 2.10.0
    • None
    • 3
    • 9223372036854775807

    Description

      sanity-quota test_0 fails with

      'SLOW IO for quota_usr (user): 426 KB/sec' 
      

      The last thing seen in the test_log from test 0 is as expected

      running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
       [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d0.sanity-quota/f0.sanity-quota-0] [count=10] [conv=fsync]
      10+0 records in
      10+0 records out
      10485760 bytes (10 MB) copied, 0.427085 s, 24.6 MB/s
       sanity-quota test_0: @@@@@@ FAIL: SLOW IO for quota_usr (user): 426 KB/sec 
      

      The subroutine that is failing is test_quota_performance() in sanity-quota.sh:

       423 test_quota_performance() {
       424         local TESTFILE="$DIR/$tdir/$tfile-0"
       425         local size=$1 # in MB
       426         local stime=$(date +%s)
       427         $RUNAS $DD of=$TESTFILE count=$size conv=fsync ||
       428                 quota_error u $TSTUSR "write failure"
       429         local etime=$(date +%s)
       430         delta=$((etime - stime))
       431         if [ $delta -gt 0 ]; then
       432             rate=$((size * 1024 / delta))
       433             if [ $(facet_fstype $SINGLEMDS) = "zfs" ]; then
       434                 # LU-2872 - see LU-2887 for fix
       435                 [ $rate -gt 64 ] ||
       436                         error "SLOW IO for $TSTUSR (user): $rate KB/sec"
       437             else
       438                 [ $rate -gt 1024 ] ||
       439                         error "SLOW IO for $TSTUSR (user): $rate KB/sec"
       440             fi
       441         fi
       442         rm -f $TESTFILE
       443 }
      

      This test started failing with this error message starting May 12, 2017 and is only failing, so far, in review-dne-part-2. The only patch that modified sanity-quota and landed close to the time this test started to fail was reviewed at https://review.whamcloud.com/26411 and has change-ID I4f22c1e9.

      There is nothing interesting in the console logs on any node.

      Here are all the failures so far:
      https://testing.hpdd.intel.com/test_sets/0ae25452-373d-11e7-b0a8-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/0f17788e-379e-11e7-a0ec-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/933c6c4a-3d26-11e7-b556-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/9d856dfc-3fe5-11e7-b3f6-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/7a15e686-4098-11e7-bc6c-5254006e85c2
      https://testing.hpdd.intel.com/test_sets/6ba8b6ae-40a8-11e7-91f4-5254006e85c2

      Attachments

        Activity

          People

            wc-triage WC Triage
            jamesanunez James Nunez (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: