[LU-12100] sanity-quota test_2: user create failure, but expect success Created: 23/Mar/19  Updated: 11/Jul/20  Resolved: 20/May/20

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: Lustre 2.13.0, Lustre 2.12.1, Lustre 2.14.0
Fix Version/s: Lustre 2.14.0, Lustre 2.12.6

Type: Bug Priority: Minor
Reporter: Maloo Assignee: Nathaniel Clark
Resolution: Fixed Votes: 0
Labels: zfs

Issue Links:
Blocker
is blocking LU-12336 Update ZFS Version to 0.8.2 Resolved
Related
is related to LU-11544 interop: sanity-quota test 10 fails w... Resolved
is related to LU-13639 sanity-quota test_2: user create fail... Reopened
Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This issue was created by maloo for Andreas Dilger <adilger@whamcloud.com>

This issue relates to the following test suite run: https://testing.whamcloud.com/test_sets/14aef9dc-4ce8-11e9-92fe-52540065bddc

test_2 failed with the following errors in the client test output:

/usr/lib64/lustre/tests/sanity-quota.sh: line 88: zpool: command not found
:
:
User quota (inode hardlimit:1024 files)
lfs setquota: warning: inode hardlimit '1024' smaller than minimum qunit size
See 'lfs help setquota' or Lustre manual for details
:
 sanity-quota test_2: @@@@@@ FAIL: user create failure, but expect success 

I haven't investigated closely what the cause is, but this has failed about 10 times in the last week.

VVVVVVV DO NOT REMOVE LINES BELOW, Added by Maloo for auto-association VVVVVVV
sanity-quota test_2 - user create failure, but expect success



 Comments   
Comment by Patrick Farrell (Inactive) [ 01/Apr/19 ]

https://testing.whamcloud.com/test_sets/fd33d7dc-5327-11e9-9720-52540065bddc

This same error made test_33 fail as well...

Comment by James Nunez (Inactive) [ 02/Apr/19 ]

Same error seen in sanity-quota test_3 for 2.12.1 and 2.13.0 with logs at
https://testing.whamcloud.com/test_sets/81a82ef4-54e6-11e9-b98a-52540065bddc
https://testing.whamcloud.com/test_sets/edda232e-4a56-11e9-a256-52540065bddc

Comment by Gu Zheng (Inactive) [ 09/May/19 ]

Another instance failed on test_2:

https://testing.whamcloud.com/test_sets/71c71570-71ad-11e9-bd0e-52540065bddc

Comment by Andreas Dilger [ 10/May/19 ]

This appears to be failing only on ZFS. The first similar failure that I could find was on 2018-06-12 with patch https://review.whamcloud.com/31976 "LU-11310 tests: for test.", though this is a test patch for SLES15 ldiskfs that never landed, so it is possible there was something wrong with that kernel/patch?
https://testing.whamcloud.com/test_sets/772bbf04-6dec-11e8-a522-52540065bddc

The next failure is on 2018-08-09 with patch https://review.whamcloud.com/31504 v19 "LU-4684 lmv: support accessing migrating directory", which eventually landed on 2018-10-01, which is after the next failure on 2018-08-23, so that could not have been the patch that introduced the problem.

I'd be inclined to look for a patch that landed just before 2018-08-09 (there was a big batch of landings on 2018-08-06), which has patch https://review.whamcloud.com/32827 "LU-11153 quota: initialize ver for default quota" among others, and more on 2018-08-09.

Comment by Patrick Farrell (Inactive) [ 21/Jul/19 ]

stancheff, you comment was correct, and looks to be useful!  I'm going to push a patch.

Comment by Gerrit Updater [ 21/Jul/19 ]

Patrick Farrell (pfarrell@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35577
Subject: LU-12100 tests: Fix typo in project quota support
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 647cb8165588d8171a0cb1d3095bc673a53f8efd

Comment by Gerrit Updater [ 01/Aug/19 ]

Nathaniel Clark (nclark@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/35667
Subject: LU-12100 tests: Use minimum soft qunit limit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 06f63234459f92d8ff5186252ac61457fb6b7e02

Comment by Nathaniel Clark [ 02/Aug/19 ]

ZFS 0.8.0 failures:
https://testing.whamcloud.com/test_sets/b4d07d78-b4d5-11e9-9f36-52540065bddc
https://testing.whamcloud.com/test_sets/09cf7628-b28b-11e9-bcf0-52540065bddc

The issue is that the second dd ("Write to exceed soft limit") seems to run much slower than first dd, and exceeds the grace period.

First dd is between 65 and 200MB/s

Write up to soft limit
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [bs=1M] [of=/mnt/lustre/d3.sanity-quota/f3.sanity-quota-0] [count=4]
4+0 records in
4+0 records out
4194304 bytes (4.2 MB) copied, 0.0603464 s, 69.5 MB/s

The second is > 0.5MB/s

Write to exceed soft limit
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [of=/mnt/lustre/d3.sanity-quota/f3.sanity-quota-0] [bs=1K] [count=10] [seek=4096]
10+0 records in
10+0 records out
10240 bytes (10 kB) copied, 33.0001 s, 0.3 kB/s

When the third ("Write before timer goes off") attempts to write the grace timer has in fact already expired.

 Disk quotas for usr quota_usr (uid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre    5125*   4096       0    none       1       0       0       -
Write before timer goes off
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [of=/mnt/lustre/d3.sanity-quota/f3.sanity-quota-0] [bs=1K] [count=10] [seek=5120]
dd: error writing '/mnt/lustre/d3.sanity-quota/f3.sanity-quota-0': Disk quota exceeded
Comment by Gerrit Updater [ 21/Aug/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/35667/
Subject: LU-12100 tests: Use minimum soft qunit limits
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 37e28b7e05a5b1f77fe663f9407436aea312b3b2

Comment by Peter Jones [ 21/Aug/19 ]

Landed for 2.13

Comment by Gerrit Updater [ 01/Oct/19 ]

Minh Diep (mdiep@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36346
Subject: LU-12100 tests: Use minimum soft qunit limits
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: a42726bd0bc203163644de63705dc3d148a54a0e

Comment by Andreas Dilger [ 15/Nov/19 ]

Still seeing this on latest master (2.13.50):
https://testing.whamcloud.com/test_sets/8fa3c85c-07e1-11ea-8e77-52540065bddc

Comment by Gerrit Updater [ 19/Nov/19 ]

Nathaniel Clark (nclark@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/36797
Subject: LU-12100 tests: Use least qunit to set limit
Project: fs/lustre-release
Branch: master
Current Patch Set: 1
Commit: 38306a6354314cc0b59ae180f311069f8a67796e

Comment by Gerrit Updater [ 21/Nov/19 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36346/
Subject: LU-12100 tests: Use minimum soft qunit limits
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: 17802fc3336b0a9546a03dd10efa496f6a7f6f42

Comment by Alex Zhuravlev [ 20/Jan/20 ]

https://testing.whamcloud.com/test_sets/efbb05c6-3b34-11ea-971c-52540065bddc

Comment by Emoly Liu [ 17/Feb/20 ]

+1 on master: https://testing.whamcloud.com/test_sets/df3f8624-4f73-11ea-a90e-52540065bddc

Comment by Gerrit Updater [ 20/May/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/36797/
Subject: LU-12100 tests: Use least qunit to set limit
Project: fs/lustre-release
Branch: master
Current Patch Set:
Commit: 33e500cfb33406b8dddac46e1dfb5a3d59ff01c5

Comment by Peter Jones [ 20/May/20 ]

Landed for 2.14

Comment by Gerrit Updater [ 29/May/20 ]

James Nunez (jnunez@whamcloud.com) uploaded a new patch: https://review.whamcloud.com/38769
Subject: LU-12100 tests: Use least qunit to set limit
Project: fs/lustre-release
Branch: b2_12
Current Patch Set: 1
Commit: c6985624f377888ba134883b742c07e36bd3bae2

Comment by Gerrit Updater [ 11/Jul/20 ]

Oleg Drokin (green@whamcloud.com) merged in patch https://review.whamcloud.com/38769/
Subject: LU-12100 tests: Use least qunit to set limit
Project: fs/lustre-release
Branch: b2_12
Current Patch Set:
Commit: abb6af1ab8f74df8d0aa1e728c63fe67b7b2d3e1

Generated at Sat Feb 10 02:49:40 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.