[LU-15187] sanity-quota test_55: Created: 01/Nov/21  Updated: 10/Nov/21  Resolved: 10/Nov/21

Status: Resolved
Project: Lustre
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Minor
Reporter: Andreas Dilger Assignee: Oleg Drokin
Resolution: Fixed Votes: 0
Labels: None

Severity: 3
Rank (Obsolete): 9223372036854775807

 Description   

This subtest is failing very regularly in Gerrit Janitor testing, hitting 100% failure for 40 test sessions when this subtest was changed with a minor update:
https://testing-archive.whamcloud.com/gerrit-janitor/19417/results.html

== sanity-quota test 55: Chgrp should be affected by group quota == 16:56:57 (1635541017)
sleep 5 for ZFS zfs
Waiting for MDT destroys to complete
Creating test directory
fail_val=0
fail_loc=0
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [of=/mnt/lustre/d55.sanity-quota/f55.sanity-quota] [bs=1024] [count=100000]
100000+0 records in
100000+0 records out
102400000 bytes (102 MB) copied, 11.7601 s, 8.7 MB/s
Disk quotas for grp quota_2usr (gid 60000):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre  100275*      0   51200       -       1       0       0       -
lustre-MDT0000_UUID
                      1*      -       1       -       1       -       0       -
lustre-OST0000_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0001_UUID
                 100275*      -  100275       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 100275
running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [chgrp] [quota_2usr] [/mnt/lustre/d55.sanity-quota/f55.sanity-quota]
 sanity-quota test_55: @@@@@@ FAIL: chgrp should failed with -EDQUOT 
  Trace dump:
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6332:error()
  = /home/green/git/lustre-release/lustre/tests/sanity-quota.sh:3808:test_55()
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6636:run_one()
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6683:run_one_logged()
  = /home/green/git/lustre-release/lustre/tests/test-framework.sh:6524:run_test()
  = /home/green/git/lustre-release/lustre/tests/sanity-quota.sh:3825:main()
Dumping lctl log to /tmp/testlogs//sanity-quota.test_55.*.1635541044.log
Delete files...
Wait for unlink objects finished...
sleep 5 for ZFS zfs
sleep 5 for ZFS zfs
Waiting for MDT destroys to complete


 Comments   
Comment by Andreas Dilger [ 02/Nov/21 ]

Comparing the test results with a PASS in autotest, it shows a quite different result:

running as uid/gid/euid/egid 60000/60000/60000/60000, groups:
 [dd] [if=/dev/zero] [of=/mnt/lustre/d55.sanity-quota/f55.sanity-quota] [bs=1024] [count=100000]
100000+0 records in
100000+0 records out
102400000 bytes (102 MB, 98 MiB) copied, 9.89286 s, 10.4 MB/s
CMD: trevis-22vm4 lctl set_param -n os[cd]*.*MDT*.force_sync=1
CMD: trevis-22vm3 lctl set_param -n osd*.*OS*.force_sync=1
Disk quotas for grp quota_2usr (gid 60001):
     Filesystem  kbytes   quota   limit   grace   files   quota   limit   grace
    /mnt/lustre       0       0   51200       -       0       0       0       -
lustre-MDT0000_UUID
                      0       -       0       -       0       -       0       -
lustre-OST0000_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0001_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0002_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0003_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0004_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0005_UUID
                      0       -       0       -       -       -       -       -
lustre-OST0006_UUID
                      0       -       0       -       -       -       -       -
Total allocated inode limit: 0, total allocated block limit: 0
running as uid/gid/euid/egid 60000/60001/60000/60001, groups:
 [chgrp] [quota_2usr] [/mnt/lustre/d55.sanity-quota/f55.sanity-quota]
chgrp: changing group of '/mnt/lustre/d55.sanity-quota/f55.sanity-quota': Disk quota exceeded

It looks like the "quota_2usr" is 60000 in the Gerrit Janitor case, and 60001 in the autotest case. That can be seen from the differences in both "Disk quotas for grp quota_2usr (gid 60001)" and "running as uid/gid/euid/egid 60000/60001/60000/60001" messages between the two results.

That also explains why the quota_2usr already has the 100MB of quota assigned, and that "chgrp $TSTUSR2" doesn't fail, because changing the file to the same GID is a no-op for the quota (different name, same number).

Oleg, could you please check/fix quota_2usr in /etc/group on your test system to use 60001 (or some other GID that is different from quota_usr).

Comment by Oleg Drokin [ 10/Nov/21 ]

I updated the centos7 image to have goccer usr_quota 2 gid and it did help indeed.

 

Thanks!

Generated at Sat Feb 10 03:16:11 UTC 2024 using Jira 9.4.14#940014-sha1:734e6822bbf0d45eff9af51f82432957f73aa32c.