[LU-8340] sanity-sec test_25: /usr/bin/lfs setquota -u quota_usr -b 13761540 -B 14449617 -i 916168 -I 961976 /mnt/lustre FAILED! Created: 28/Jun/16 Updated: 05/Oct/16 Resolved: 05/Oct/16 |
|
| Status: | Resolved |
| Project: | Lustre |
| Component/s: | None |
| Affects Version/s: | Lustre 2.9.0 |
| Fix Version/s: | Lustre 2.9.0 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Maloo | Assignee: | Kit Westneat |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Full - EL7.2 Server/EL7.2 Client |
||
| Issue Links: |
|
||||||||
| Severity: | 3 | ||||||||
| Rank (Obsolete): | 9223372036854775807 | ||||||||
| Description |
|
This issue was created by maloo for Saurabh Tandan <saurabh.tandan@intel.com> This issue relates to the following test suite run: https://testing.hpdd.intel.com/test_sets/12ba69e2-3a6a-11e6-a0ce-5254006e85c2. The sub-test test_25 failed with the following error: /usr/bin/lfs setquota -u quota_usr -b 13761540 -B 14449617 -i 916168 -I 961976 /mnt/lustre FAILED! test log: Started clients trevis-48vm1.trevis.hpdd.intel.com,trevis-48vm2: CMD: trevis-48vm1.trevis.hpdd.intel.com,trevis-48vm2 mount | grep /mnt/lustre' ' 10.9.6.52@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr) 10.9.6.52@tcp:/lustre on /mnt/lustre type lustre (rw,flock,user_xattr) CMD: trevis-48vm2 PATH=/usr/lib64/lustre/tests:/usr/lib/lustre/tests:/usr/lib64/lustre/tests:/opt/iozone/bin:/usr/lib64/lustre/tests//usr/lib64/lustre/tests:/usr/lib64/lustre/tests:/usr/lib64/lustre/tests//usr/lib64/lustre/tests/../utils:/opt/iozone/bin:/usr/lib64/lustre/tests/mpi:/usr/lib64/lustre/tests/racer:/usr/lib64/lustre/../lustre-iokit/sgpdd-survey:/usr/lib64/lustre/tests:/usr/lib64/lustre/utils/gss:/usr/lib64/lustre/utils:/usr/lib64/qt-3.3/bin:/usr/lib64/compat-openmpi16/bin:/usr/bin:/bin:/usr/sbin:/sbin::/sbin:/sbin:/bin:/usr/sbin: NAME=autotest_config sh rpc.sh set_default_debug \"vfstrace rpctrace dlmtrace neterror ha config ioctl super lfsck\" \"all -lnet -lnd -pinger\" 4 CMD: trevis-48vm7 lctl get_param -n timeout Using TIMEOUT=20 CMD: trevis-48vm7 lctl dl | grep ' IN osc ' 2>/dev/null | wc -l CMD: trevis-48vm1.trevis.hpdd.intel.com lctl dl | grep ' IN osc ' 2>/dev/null | wc -l enable quota as required CMD: trevis-48vm7 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-MDT0000.quota_slave.enabled CMD: trevis-48vm8 /usr/sbin/lctl get_param -n osd-ldiskfs.lustre-OST0000.quota_slave.enabled [HOST:trevis-48vm1.trevis.hpdd.intel.com] [old_mdt_qtype:ug] [old_ost_qtype:ug] [new_qtype:ug3] CMD: trevis-48vm7 /usr/sbin/lctl conf_param lustre.quota.mdt=ug3 CMD: trevis-48vm7 /usr/sbin/lctl conf_param lustre.quota.ost=ug3 Total disk size: 13760516 block-softlimit: 13761540 block-hardlimit: 14449617 inode-softlimit: 916168 inode-hardlimit: 961976 Setting up quota on trevis-48vm1.trevis.hpdd.intel.com:/mnt/lustre for quota_usr... + /usr/bin/lfs setquota -u quota_usr -b 13761540 -B 14449617 -i 916168 -I 961976 /mnt/lustre setquota failed: Operation not permitted sanity-sec test_25: @@@@@@ FAIL: /usr/bin/lfs setquota -u quota_usr -b 13761540 -B 14449617 -i 916168 -I 961976 /mnt/lustre FAILED! This issue was first seen on 06/19/2016 and since then it has occurred around 22 times till now (i.e. in 10 days, 22 times). |
| Comments |
| Comment by Niu Yawei (Inactive) [ 08/Jul/16 ] |
|
The setquota failed for -EPERM, I think it's caused by nodemap permission check (nodemap_can_setquota()), but I don't see why the "nmf_allow_root_access" wasn't set properly in this case. Perhaps we need some nodemap expert to take a look. |
| Comment by Joseph Gmitter (Inactive) [ 12/Jul/16 ] |
|
Hi Kit, Would you be able to look at this issue? Our engineer believes it is a nodemap permission check issue. Can you please advise? Thanks! |
| Comment by Jian Yu [ 27/Jul/16 ] |
|
More failure instances on master branch: |
| Comment by Peter Jones [ 12/Aug/16 ] |
|
Any update Kit? Do you think that this is something we ought to fix for 2.9 or could it wait for a future release? |
| Comment by Kit Westneat [ 12/Aug/16 ] |
|
Hi Peter, I've taken a look at it, but I'm not sure why it fails sometimes and not others. My gut is that it's just a test issue, but I will spend some time trying to reproduce it on my system and get a clearer answer. I don't think it's a blocker, but it would be nice to have it sorted out before the next release. |
| Comment by Peter Jones [ 12/Aug/16 ] |
|
Kit Thanks for the update. Those kind of issues can be infuriating. Do you know how to leverage test parameters to easily run the test multiple times? Adding additional debug to the tests and doing that is usually how engineers manage to get to the bottom of such issues. Peter |
| Comment by Kit Westneat [ 12/Aug/16 ] |
|
That's a good idea, I'll upload a debug patch as well with test params to see if I can reliably trigger it on maloo. |
| Comment by Gerrit Updater [ 12/Aug/16 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/21907 |
| Comment by Saurabh Tandan (Inactive) [ 25/Aug/16 ] |
|
Another instance: Interop EL7.2 Server/2.8.0 EL7.2 Client |
| Comment by Peter Jones [ 01/Sep/16 ] |
|
Kit Has the debug patch provided the information you hoped for? Peter |
| Comment by Kit Westneat [ 06/Sep/16 ] |
|
Hi Peter, I haven't been able to reproduce the bug yet, I'll try some more this week. Thanks, |
| Comment by Gerrit Updater [ 13/Sep/16 ] |
|
Andreas Dilger (andreas.dilger@intel.com) merged in patch http://review.whamcloud.com/21907/ |
| Comment by Peter Jones [ 23/Sep/16 ] |
|
Kit The debug patch has landed for over a week now. Have you been able to gather any useful info as a result? Peter |
| Comment by Gerrit Updater [ 27/Sep/16 ] |
|
Kit Westneat (kit.westneat@gmail.com) uploaded a new patch: http://review.whamcloud.com/22750 |
| Comment by Kit Westneat [ 27/Sep/16 ] |
|
Hi Peter, I think it's an issue with the ENABLE_QUOTA flag being set. When it is set, restarting Lustre causes the client to issue quota commands, but quota permissions were disabled by nodemap at that spot. This patch reenables the admin permissions on the client, so that should fix it. Is it possible to confirm that ENABLE_QUOTA is disabled on the normal tests, but enabled on the full tests? Thanks, |
| Comment by Jian Yu [ 27/Sep/16 ] |
Yes, Kit. For full test group, ENABLE_QUOTA=yes is specified. For review-* test groups, ENABLE_QUOTA is disabled. |
| Comment by Gerrit Updater [ 05/Oct/16 ] |
|
Oleg Drokin (oleg.drokin@intel.com) merged in patch http://review.whamcloud.com/22750/ |
| Comment by Peter Jones [ 05/Oct/16 ] |
|
Landed for 2.9 |